I have a nginx server with 5 backend servers. We serve around 400-500 requests/second. I have started getting a large number of Upstream Timed out errors (110: Connection timed out)
Error string in error.log looks like
2011/01/10 21:59:46 [error] 1153#0: *1699246778 upstream timed out (110: Connection timed out) while reading response header from upstream, client: {IP}, server: {domain}, request: "GET {URL} HTTP/1.1", upstream: "http://{backend_server}:80/{url}", host: "{domain}", referrer: "{referrer}"
Any suggestions how to debug such errors. I am unable to find a munin plugin to keep a check on number of upstream errors. Sometime the number of errors per day is way too high and somedays its a more decent 3 digit number. A munin graph would probably help us finding out any pattern or correlation with anything else
How can we make the number of such error as ZERO
As Martin said, this error belong to your backends, although you can make sure that you don't queue too much requests on a failed backend and get a good overview of backend status with haproxy and its queueing and healthchecking capabilities. Logging of upstream response time in nginx ($upstream_response_time) can be helpful too.
I had a similar Problem, but mine came from not having the
/etc/hosts
setup properly for my domain.I needed to add the FQDN and just the hostname associated to the IP address of my domain. For example,
Note how the IP is mapped to the FQDN, as well as just it's hostname.
Check the logs for your backend servers. The problem could be on the network, but it's far more likely that your backend servers are taking too long and timing out.