Our app's REST API is served by Gunicorn (not behind Nginx) running on AWS EC2 instances with a typical auto-scaling/load balancing setup. The load balancer's idle timeout is 60 seconds, and Gunicorn's keep-alive timeout is 2 seconds. We've been seeing sporadic 504 Gateway Timeout
responses from this configuration. According to Amazon docs, this may be because the server's keep-alive timeout is lower than the load balancer's idle timeout setting:
Cause 2: Registered instances closing the connection to Elastic Load Balancing.
Solution 2: Enable keep-alive settings on your EC2 instances and set the keep-alive timeout to greater than or equal to the idle timeout settings of your load balancer.
With Nginx, the default keepalive_timeout
is 75 seconds, which apparently works well with the ELB default settings. However, Gunicorn docs recommend a keepalive
setting in the range of 1-5 seconds.
Does it make sense to bump Gunicorn's keepalive to 75 seconds, or is there a good reason for keep it below 5 seconds even though we're not using a reverse proxy in front of it?
You will almost certainly want to raise the keepalive timer per the ELB recommendation, because ELB reuses connections. It will hold them until the timeout expires and if another request arrives at the ELB, it will often use one of the already open connections to send it to you.
504 Gateway Timeout
is an odd error for this condition but it appears that's what ELB returns when the reuse of a connection coincides with the back-end's premature close.The 5 second recommendation might make sense if browsers were communicating with the back-end directly, but that isn't the case with ELB, which is itself a proper reverse-proxy when running in HTTP mode.