After debugging for 6-hours - I am giving this up :|
We have a nginx+php-fpm+mysql in LAN with almost 100 wordpress (created and used by different designers/developers all working on test wordpres setup)
We are using nginx without any issues from long.
Today, all of a sudden - nginx started returning "504 Gateway Time-out" out of the blue...
I checked nginx error log for a virtual host...
2010/09/06 21:24:24 [error] 12909#0: *349 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 21:25:11 [error] 12909#0: *349 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 21:25:11 [error] 12909#0: *443 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /info.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 21:25:12 [error] 12909#0: *443 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:08:32 [error] 12909#0: *1025 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:09:33 [error] 12909#0: *1025 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:09:40 [error] 12909#0: *1064 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /info.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:09:40 [error] 12909#0: *1064 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:24:44 [error] 12909#0: *1313 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:24:53 [error] 12909#0: *1313 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
As I run php-fpm on port 9000 via TCP mode, I ran "netstat | grep 9000" and noticed something unusual... (Pasting partial output here for ease of read)
tcp 9 0 localhost:9000 localhost:36094 CLOSE_WAIT 14269/php5-fpm
tcp 0 0 localhost:46664 localhost:9000 FIN_WAIT2 -
tcp 1257 0 localhost:9000 localhost:36135 CLOSE_WAIT -
tcp 1257 0 localhost:9000 localhost:36125 CLOSE_WAIT -
tcp 9 0 localhost:9000 localhost:36102 CLOSE_WAIT 14268/php5-fpm
tcp 0 0 localhost:46662 localhost:9000 FIN_WAIT2 -
tcp 745 0 localhost:9000 localhost:46644 CLOSE_WAIT -
tcp 0 0 localhost:46658 localhost:9000 FIN_WAIT2 -
tcp 1265 0 localhost:9000 localhost:46607 CLOSE_WAIT -
tcp 0 0 localhost:46672 localhost:9000 ESTABLISHED 12909/nginx: worker
tcp 1257 0 localhost:9000 localhost:36119 CLOSE_WAIT -
tcp 1265 0 localhost:9000 localhost:46613 CLOSE_WAIT -
tcp 0 0 localhost:46646 localhost:9000 FIN_WAIT2 -
tcp 1257 0 localhost:9000 localhost:36137 CLOSE_WAIT -
tcp 0 0 localhost:46670 localhost:9000 ESTABLISHED 12909/nginx: worker
tcp 1265 0 localhost:9000 localhost:46619 CLOSE_WAIT -
tcp 1336 0 localhost:9000 localhost:46668 ESTABLISHED -
tcp 0 0 localhost:46648 localhost:9000 FIN_WAIT2 -
tcp 1336 0 localhost:9000 localhost:46670 ESTABLISHED -
tcp 9 0 localhost:9000 localhost:36108 CLOSE_WAIT 14274/php5-fpm
tcp 1336 0 localhost:9000 localhost:46684 ESTABLISHED -
tcp 0 0 localhost:46674 localhost:9000 ESTABLISHED 12909/nginx: worker
tcp 1336 0 localhost:9000 localhost:46666 ESTABLISHED -
tcp 1257 0 localhost:9000 localhost:46648 CLOSE_WAIT -
tcp 1336 0 localhost:9000 localhost:46678 ESTABLISHED -
tcp 0 0 localhost:46668 localhost:9000 ESTABLISHED 12909/nginx: wo
There are plenty of "CLOSE_WAIT" & "FIN_WAIT2" pairs as highlighted below (in above output):
tcp 1337 0 localhost:9000 localhost:46680 CLOSE_WAIT -
tcp 0 0 localhost:46680 localhost:9000 FIN_WAIT2 -
Please note port 46680 in above.
I enabled mysql slow queries error log, but it didn't work.
As of now restarting php5-fpm every minute via a cronjob (see command below) keeping everything running "smoothly" but I hate patchwork and want to solve this...
1 * * * * service php5-fpm restart > /dev/null
I searched extensively on Google - got no help. As mentioned, this a test-server in LAN, CPU load is never crossed 0.10 and memory usage is also below 25% (System has 2GB RAM and ubuntu-server installed) So if you find its time-confusing to help me out, please atleast drop a hint.
Thanks in advance for help.
-Rahul
(note - this is reposting of - http://forum.nginx.org/read.php?11,127694)
Update: I found answer, which is posted below.
I found answer on my posting on the nginx forum - http://forum.nginx.org/read.php?2,127854
The answer, in my case, is to set:
in php-fpm config (usually
/etc/php5/fpm/php-fpm.conf
)Note, you can use values other than 30s also.
I used it to match my value in main
php.ini
file which is:Thanks All. :-)
Here how it resolved my issue:
make following changes to /etc/nginx/nginx.conf in http { section
and then restart nginx
/etc/init.d/nginx restart
If you are using php 5.3, increase the backlog.
If you are using php 5.2, backport the patch to increase the backlog size from 128.
Also, use a unix socket rather than a TCP socket. unix:/tmp/php5-cgi.sock (or the relevant path)
Great, thanks
request_terminate_timeout=30s
It works perfectly for me
but, I had to insert the line in this file : "/etc/php5/fpm/pool.d/www.conf" that is to say in the "Worker Section".
PHP 5.3.21-1 - Wordpress 3.5.1
http://php-fpm.org/wiki/Configuration_File
in my case (same nginx error message), some problematic php scripts are not ending to execute and waiting for something, resulting no more php5-fpm children for nginx to pick.
fix:
request_terminate_timeout=30s
pm.max_spare_servers=16
pm.min_spare_servers=2
now everything worked like a charm.
For me same issue has occured after removing rabbitmq from server. Nothing of the above wasnt usefull, reinstalling all php5 modules solved the problem. I had Debian 8.2 on that server. Hope will be helpful for someone.
This may also help folks:
Depending on what your setup is you should look at the fastcgi configuration params as well as php ... in my case (I am using apache2 + php5-fpm) and max_execution time is also dependent on how long the fastcgi module waits for response (-idle-timeout) ...
http://www.fastcgi.com/mod_fastcgi/docs/mod_fastcgi.html#FastCgiExternalServer
I had the same problem and I solved it by completely removeing Apache:
After that I reccomend restaring both PHP and NGINX:
In our case the problem was that php-fpm couldn't open a session on an NFS share , it seems like one of the web nodes locked the directory , after restarting the NFS and only allowed 1 node to access it the issue was resolved ,Thank God
You should always check your php-fpm logs , in my case the location was at /var/opt/remi/php72/log/php-fpm
you should also enable slow scripts logs for better debugging and check the www-slow.log file located in the logs directory
to make sure slowlog is working make sure the following settings present in www.conf