First off we have an SR opened with Novell but I figure I'd ask here anyway. The quick scenario is this.
When the load average on one of the access gateways (LAG) spiked above 2.0 I saw some CPU wait time and the ics_dyn proc was over 50% and the java proc was over 112%.- It lasted only about 1 second. The load average dipped back down right away and the offending processes were dropped to under 10% CPU.
I went to the web console and saw the LAG was not responding. I refreshed and it went green right away.
It looks like the processes are spiking, the cpu goes over 100% and other process are now waiting for CPU time ics_dyn restarts and everything goes back to normal but the cycle starts all over again.
grep RESTARTED ics_dyn.log
Mar 29 09:34:58 <SERVERNAME> vmcontroller: AM#404514000: AMDEVICEID#: AMAUTHID#0: AMEVENTID#0: VM-0 DOWN, being RESTARTED (Tue Mar 29 09:34:57 2011 ). restarted 60 times. fastRestartMode.
Mar 29 09:38:33 <SERVERNAME> vmcontroller: AM#404514000: AMDEVICEID#: AMAUTHID#0: AMEVENTID#0: VM-0 DOWN, being RESTARTED (Tue Mar 29 09:38:32 2011 ). restarted 61 times. fastRestartMode.
Mar 29 09:51:17 <SERVERNAME> vmcontroller: AM#404514000: AMDEVICEID#: AMAUTHID#0: AMEVENTID#0: VM-0 DOWN, being RESTARTED (Tue Mar 29 09:51:16 2011 ). restarted 62 times. fastRestartMode.
Lots of these in dmesg.
ics_dyn[11708]: segfault at 1c ip b5caf868 sp b22561d0 error 4 in libproxy.so.1[b5b8b000+1e2000]
I know there is going to be requests for more info and I'm prepared to provide it. The problem is kinda strange and I think it is a bug in the access manger software.
Version 2.7.3 (20100428_184640) Copyright (c) 1999-2009 Novell, Inc. All rights reserved.
Field Patches: Field Patch 3 -- 20100425
After coredump and log analysis the fix was provided by Novell. It's a bug in their code and they supplied a recompiled libproxy.