So got the below earlier to day...
Around that time the logs show a ramp in processes(600) and associated memory (1.2g), cpu usage load average (80) untill the server gave out.
Server had to be hard reset by host as there was no ssh or plesk panel access.
Fast CGI is configured as below and is setup for one high use site. As I understand it FcgidMaxProcesses 20 should protect against what happen but has not.
I've read many forums with differing answers and references to many different fcgi directives, but have found nothing conclusive. Any one got some definitive answers on how to stop this sort of server process ramping and subsequent server failure?
If you need more info let me know.
Cheers Andy
/var/log/apache2/error_log
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17651 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17650 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17649 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17644 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17643 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17638 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17633 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17627 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17622 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17674 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17673 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17672 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17667 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17666 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17665 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17664 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17659 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17658 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17657 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17656 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17651 graceful kill fail, sending SIGKILL
https://docs.google.com/a/thesugarrefinery.com/open?id=0B_XbpWChge0VRmFLWEZfR2VBb2M https://docs.google.com/a/thesugarrefinery.com/open?id=0B_XbpWChge0VWTcwZEhoV2Fqejg https://docs.google.com/a/thesugarrefinery.com/open?id=0B_XbpWChge0VUUtVWWFINHZjZ0U https://docs.google.com/a/thesugarrefinery.com/open?id=0B_XbpWChge0VZEVMclh6ZUdaOUE
<IfModule mod_fcgid.c>
<IfModule !mod_fastcgi.c>
AddHandler fcgid-script fcg fcgi fpl
</IfModule>
FcgidIPCDir /var/lib/apache2/fcgid/sock
FcgidProcessTableFile /var/lib/apache2/fcgid/shm
FcgidIdleTimeout 40
FcgidProcessLifeTime 30
FcgidMaxProcesses 20
FcgidMaxProcessesPerClass 20
FcgidMinProcessesPerClass 0
FcgidConnectTimeout 30
FcgidIOTimeout 120
FcgidInitialEnv RAILS_ENV production
FcgidIdleScanInterval 10
FcgidMaxRequestLen 1073741824
</IfModule>
There was a bug in Debian (at least) that rendered the limit useless with virtual hosts. It is fixed now.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=615814
This is often caused by a setuid CGI script that hangs; it exceeds the IOtimeout, and apache tries to kill it, but is unable because of the change in uid, resulting in the error.
You may want to increase the FcgidIOTimeout or FcgidProcessLifetime to allow the thread more time to complete.
Another workaround is to make the Apache server run under the same UID that the setuid script is chaning to. This allows it to kill the process, though it may not be advisable for security reasons. Similarly, running apache as root is also a workaround but not very secure. If you do this, note that your fcgi sock directory (under /var/lib/apache2/fcgid/sock or similar) and process table file need to be writeable by the apache process owner.
The root cause, though, is the CGI script itself taking too long. The cause for that depends on the CGI code which I have no visibiilty of.