Over the last month I was forced to learn a lot of things about server configuration, integration, AWS, etc. I have never done these to this extent.
I got everything up and running well for my app (thanks mostly to the http://github.com/wr0ngway/rubber gem and help from #rubberec2 IRC channel). However, I'm encountering a mysterious (to me) problem.
Stack
I am running Nginx + Passenger behind HAProxy. So far only one Nginx + Passenger host is being used, so HAProxy doesn't really do much yet, but we will add more app servers in the future.
Problem
I am stuck with occasional 503 errors that become annoying at certain times of day (during a higher load?). These errors are happening on both static assets, and routed urls. I have determined that it's HAProxy that throws them, because the page and its headers are identical to what's in /etc/haproxy/errors/503.http.
I thought that nginx doesn't care how many requests it receives, it can handle all of them, since it has its own queueing, and passenger distributes things correctly. So why then HAProxy claims there was no server available to handle some requests?
My HAProxy config
global
log 127.0.0.1 local0 warning
maxconn 1024
defaults
log global
mode http
retries 3
balance roundrobin
option abortonclose
option redispatch
option httplog
contimeout 4000
clitimeout 150000
srvtimeout 30000
listen passenger_proxy x.x.x.x:x
option forwardfor
server web01 web01:xxxx maxconn 20 check
Note: IPs and ports are replaced with x
es.
P.S. I'm not good at this stuff, learning as I go.
Update
I used siege
to benchmark the server and found that I can reproduce the 503s when running about 58 concurrent sessions. The success rate is only 54% in such case.
Update 2
I found out that nginx access log outputs "-" 400 0 "-" "-" "-"
every time I get 503.
Update 3
Everyone says that nginx gives "400 Bad Request" errors when the cookies are too big. However setting large_client_header_buffers
directive didn't fix it for me.
Update 4
I ran siege
on the server, targeting nginx directly on its listen port, and now nginx started returning 499 errors with the same pattern as it used to return 503s before. Siege keeps telling me that connection timed out when that happens. Looks like I'm getting closer.
Update 5
I noticed that nginx was logging in two places on my system, and there was an error log returning this message every time siege showed "Connection timed out":
file=ext/nginx/HelperAgent.cpp:574 time=2011-09-15 07:43:22.196 ]: Couldn't forward the HTTP response back to the HTTP client: It seems the user clicked on the 'Stop' button in his browser.
From the HAProxy configuration guide you need to increase the
maxconn
parameter on your server declaration.I highly suggest reading through the whole document as there is alot of good info in there.