I've started using Nginx as a reverse proxy for a set of servers that provide some sort of service.
The service can be rather slow at times (its running on Java and the JVM sometimes gets stuck in "full garbage collection" that may take several seconds), so I've set the proxy_connect_timeout
to 2 seconds, which will give Nginx enough time to figure out that the service is stuck on GC and will not respond in time, and it should pass the request to a different server.
I've also set proxy_read_timeout
to prevent the reverse proxy from getting stuck if the service itself takes too much time to compute the response - again, it should move the request to another server that should be free enough to return a timely response.
I've run some benchmarks and I can see clearly that the proxy_connect_timeout
works properly as some requests return exactly on the time specified for the connection timeout, as the service is stuck and doesn't accept incoming connections (the service is using Jetty as an embedded servlet container). The proxy_read_timeout
also works, as I can see requests that return after the timeout specified there.
The problem is that I would have expected to see some requests that timeout after proxy_read_timeout + proxy_connect_timeout
, or almost that length of time, if the service is stuck and won't accept connections when Nginx tries to access it, but before Nginx can timeout - it gets released and starts processing, but is too slow and Nginx would abort because of the read timeout. I believe that the service has such cases, but after running several benchmarks, totaling several millions of requests - I failed to see a single request that returns in anything above proxy_read_timeout
(which is the larger timeout).
I would appreciate any comment on this issue, though I think that could be due to a bug in Nginx (I have yet to look at the code, so this is just an assumption) that the timeout counter doesn't get reset after the connection is successful, if Nginx didn't read anything from the upstream server.
I was actually unable to reproduce this on:
I set this up in my nginx.conf:
I then setup two test servers. One that would just timeout on the SYN, and one that would accept connections but never respond:
Then I sent in one test connection:
Then watched error_log which showed this:
then:
And then the access.log which has the expected 30s timeout (10+20):
Here is the log format I'm using which includes the individual upstream timeouts:
Connect timeout means TCP stalls when handshaking (for e. g., there were no SYN_ACKs). TCP would re-try sending SYNs, but you've given only to 2 sec. to Nginx to go use another Server, so it simply has no time for re-sending SYNs.
UPD.: Couldn't find in docs, but tcpdump shows that there's 3 sec. delay between 1st sent SYN and the 2nd attempt to send SYN.