I've got a server running on a Linode with Ubuntu 10.04 LTS, Nginx 0.7.65, MySQL 5.1.41 and PHP 5.3.2 with PHP-FPM.
There is a WordPress blog on it, updated to WordPress 3.2.1 recently.
I have made no changes to the server (except updating WordPress) and while it was running fine, a couple of days ago I started having downtimes.
I tried to solve the problem, and checking the error_log I saw many timeouts and messages that seemed to be related to timeouts. The server is currently logging this kind of errors:
2011/07/14 10:37:35 [warn] 2539#0: *104 an upstream response is buffered to a temporary file /var/lib/nginx/fastcgi/2/00/0000000002 while reading upstream, client: 217.12.16.51, server: www.example.com, request: "GET /page/2/ HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "www.example.com", referrer: "http://www.example.com/"
2011/07/14 10:40:24 [error] 2539#0: *231 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 46.24.245.181, server: www.example.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.example.com", referrer: "http://www.google.es/search?sourceid=chrome&ie=UTF-8&q=example"
and even saw this previous serverfault discussion with a possible solution: to edit /etc/php/etc/php-fpm.conf
and change
request_terminate_timeout=30s
instead of
;request_terminate_timeout= 0
The server worked for some hours, and then broke again. I edited the file again to leave it as it was, and restarted again php-fpm (service php-fpm restart
) but no luck: the server worked for a few minutes and back to the problem over and over. The strange thing is, although the services are running, htop shows there is no CPU load (see image) and I really don't know how to solve the problem.
The config files are on pastebin
The /etc/nginx/nginx.conf
is here
The /etc/nginx/sites-available/www.example.com
is here
Have you tried instead of "upstream" -ing in nginx.conf doing something like:
Take a look here http://www.if-not-true-then-false.com/2011/nginx-and-php-fpm-configuration-and-optimizing-tips-and-tricks/
The problem is php-fpm config
But it's not the timeout. Increasing the timeout just gives php more time to process a single request - which may mask the symptoms but is not the right solution.
The php-fpm log should make the reason why the server is struggling apparent; in my experience (obviously in the absence of information this is a guess) the php-fpm log file will contain entries like this:
If there are only a few log entries like the above, that's not much of a problem. If there are many and only minutes or seconds apart - then php-fpm has insufficient resources for the load it's being asked to cope with.
This is not uncommon because a standard dist php-fpm config file will contain something similar to this:
Which means php-fpm will only handle a maximum of 5 requests in parallel.
Especially with something like wordpress, which for a single html page hands a large number of subsequent requests (images, css, js files etc.) also to php - it is easy for a large and ever-increasing queue of requests to form such that for any given request it must first wait for the in-process and already-waiting requests to be processed first. This leads to delays (it will show up as waiting time in any browser profiling tool) and frequently leads to a large number of time outs.
Also note that a large number of 404s (requests for anything that don't exist) is an easy way to exaggerate the limitations of any server - check for and fix any 404s that the site is generating.
How to fix it
If the problem is that php-fpm has too few server-processes running - just increase them. The numbers to use depend on the hardware of the server it is deployed upon; here's a suggestion:
This would permit serving 20 requests in parallel - and should alleviate any problems without causing the server to struggle.
If in doubt though, there's a simple rule to follow when changing php-fpm config: