I have a FastCGI (mod_fastcgi)problem. It happens every once in a while, and does not casue a complete server meltdown, just 500 errors. Here are a couple things. First I am using APC so PHP is in control of it's own processes, not FastCGI. Also, I have the webroot set as:
/var/www/html
And the fcgi-bin inside:
/var/www/html/fcgi-bin
First off here is the apache error_log:
[Fri Jan 07 10:22:39 2011] [error] [client 50.16.222.82] (4)Interrupted system call: FastCGI: comm with server "/var/www/html/fcgi-bin/php.fcgi" aborted: select() failed, referer: http://www.domain.com/
I also ran strace on the 'fcgi-pm' process. Here is a snip from the trace around the time it bombs out:
21725 gettimeofday({1294420603, 14360}, NULL) = 0
21725 read(14, "C /var/www/html/fcgi-bin/php.fcgi - - 6503 38*", 16384) = 46
21725 alarm(131) = 0
21725 select(15, [14], NULL, NULL, NULL) = 1 (in [14])
21725 alarm(0) = 131
21725 gettimeofday({1294420603, 96595}, NULL) = 0
21725 read(14, "C /var/www/html/fcgi-bin/php.fcgi - - 6154 23*C /var/www/html/fcgi-bin/php.fcgi - - 6483 28*", 16384) = 92
21725 alarm(131) = 0
21725 select(15, [14], NULL, NULL, NULL) = 1 (in [14])
21725 alarm(0) = 131
21725 gettimeofday({1294420603, 270744}, NULL) = 0
21725 read(14, "C /var/www/html/fcgi-bin/php.fcgi - - 5741 38*", 16384) = 46
21725 alarm(131) = 0
21725 select(15, [14], NULL, NULL, NULL) = 1 (in [14])
21725 alarm(0) = 131
21725 gettimeofday({1294420603, 311502}, NULL) = 0
21725 read(14, "C /var/www/html/fcgi-bin/php.fcgi - - 6064 32*", 16384) = 46
21725 alarm(131) = 0
21725 select(15, [14], NULL, NULL, NULL) = 1 (in [14])
21725 alarm(0) = 131
21725 gettimeofday({1294420603, 365598}, NULL) = 0
21725 read(14, "C /var/www/html/fcgi-bin/php.fcgi - - 6179 33*C /var/www/html/fcgi-bin/php.fcgi - - 5906 59*", 16384) = 92
21725 alarm(131) = 0
21725 select(15, [14], NULL, NULL, NULL) = 1 (in [14])
21725 alarm(0) = 131
21725 gettimeofday({1294420603, 454405}, NULL) = 0
I noticed that the 'select()' seems to stay the same regardless, however the read() changes its return from 46 to some other number while it is bombing out. Has anyone seen anything like this. Could this be some sort of file locking?
Thanks, Ben
Synopsis
I have observed the very same behavior with Apache; it seems that this problem is not specific to lighttpd.
In my case, the symptoms were exactly the same; the Apache access logs were peppered with intermittent 500 response codes, and there were no corresponding entries in PHP's error log (and PHP error-reporting was configured to be maximally verbose).
I described the issue extensively on the Apache mailing list (search the list archives for the subject "Intermittent 500 responses in access.log without corresponding entries in error.log").
Root Cause
1100110's answer hints at the root cause, but I'll provide additional documentation, straight from Apache, as well as suggestions for eliminating the problem.
Here is the official word from Apache on this matter:
https://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html :
There we have it.
Possible Solutions
Option 1
One solution is to set PHP_FCGI_MAX_REQUESTS to zero, but taking this measure introduces the potential for memory leaks to grow out of control.
The various bits of documentation that I have consulted do not make it clear whether PHP via Fast-CGI suffers from inherent memory-leaking (hence this built-in "process recycling" behavior) or if the risk is limited to poorly-written, "runaway" scripts.
In any case, there is risk inherent to setting PHP_FCGI_MAX_REQUESTS to zero, especially in a shared hosting environment.
Option 2
A second solution, as described in the excerpt above, is to set FcgidMaxRequestsPerProcess to a value less than or equal to PHP_FCGI_MAX_REQUESTS. The documentation omits an important point, however: the value must also be greater than zero (because zero means "unlimited" or "disable the check" in this context). Given that the default value for FcgidMaxRequestsPerProcess is zero, and the default value for PHP_FCGI_MAX_REQUESTS is 500, any administrator who has not overridden these values will experience the intermittent 500 response codes. For this reason, I fail to understand why FcgidMaxRequestsPerProcess and PHP_FCGI_MAX_REQUESTS do not share the same default value. Perhaps this is because configuring these two directives as such yields the same net result as setting PHP_FCGI_MAX_REQUESTS to zero; the documentation is ambiguous in this regard.
Option 3
A third solution is to abandon Fast-CGI altogether, in favor of a comparable alternative, such as suPHP or plain-old CGI + SuExec. I have performed some basic, raw performance benchmarking across the various PHP modes, and my findings are as follows:
Mod-PHP is the highest-performing, with a score of 77.7. The scores are arbitrary and serve only to demonstrate the relative variance in page-load-times across PHP modes.
If we assume that these benchmarks are fairly representative, then there seem to be very few reasons to cling to Fast-CGI, given this one (fairly serious) flaw in its implementation. The only substantial reason that comes to mind is op-code caching. My understanding is that PHP cannot utilize op-code caching via CGI or suPHP mode (because processes do not persist across requests).
While Fast-CGI does not take advantage of op-code caching (e.g., via APC) out-of-the-box, clever users have devised a method for rendering APC effective with Fast-CGI (via per-user caches): http://www.brandonturner.net/blog/2009/07/fastcgi_with_php_opcode_cache/ . There are several drawbacks, however:
As a related corollary, you said the following in your question:
Unless you're using mod_fastcgi (and not mod_fcgid), and unless you've followed steps similar to those cited a few paragraphs above, APC is consuming resources without effect. As such, you may wish to disable APC.
Summary of Solution
Take one of the following three measures:
I read somewhere (dealing with lighttpd, not apache) that php cannot handle more than 500 requests for some reason. The 501st request will bomb for whatever reason.
Sorry I do not have more information than that, but it's at the very least worth a shot.
tl;dr try setting PHP_FCGI_MAX_REQUESTS to 500 and seeing if the problem clears itself up.
Found the information, it applies to Lighttpd, and I do not know if it applies to apache or not.
Test it and I would love to hear if this is only an issue with lighttpd, or if it is a general issue.
"This problem seems to stem from a little-known issue with PHP: PHP stops accepting new FastCGI connections after handling 500 requests; unfortunately, there is a potential race condition during the PHP cleanup code in which PHP can be shutting down but still have the socket open, so lighty can send request number 501 to PHP and have it "accepted", but then PHP appears to simply exit, causing a 500 return from lighty.
To limit this occurance, set PHP_FCGI_MAX_REQUESTS to 500."
--http://redmine.lighttpd.net/projects/1/wiki/Docs:PerformanceFastCGI
Thanks for your response. I have all of the PHP error going to a log file. I get a few notices but no errors. I must admit that I did not write this code. For now I have redirected all 500 errors to the index.php, using a '.htaccess' rule. I must be missing something though. This should not be happening. The only guess I have is that once the 'PHP_FCGI_MAX_REQUESTS' reaches it max, php kills the child and this confuses the FastCGI. However, if I understand correctly PHP has a parent process that should be the only one that FastCGI talks to, so I am not to sure that that is it... Here is my wrapper script:
This is very high volume server so that is why the PHP_FCGI_CHILDREN is set so high.
Thanks again, Ben