I have set up two dedicated haproxy servers to spread the load on three application servers. I have set up regular http 80 balancing, and also a special to work with websockets.
It works great for about 2 hours, but after that it gets really painfully slow, a page load is about 30 seconds. When I restart haproxy it's good again.
Below is my conf. Any idea what may be causing this?
global
user haproxy
group haproxy
defaults
mode http
timeout connect 5s
timeout client 5s
timeout server 60s
stats enable
stats auth aa:bb
frontend proxy
# listen on 80
bind 0.0.0.0:80
# allow for many connections, with long timeout
maxconn 200000 # total maximum connections, check ulimit as well
timeout client 24h
# default to webapp backend
default_backend webapp
# is this a socket io request?
acl is_websocket hdr_end(host) -i node.domain.com
use_backend websocket if is_websocket
backend webapp
balance roundrobin # assuming that you don't need stickiness
# allow client connections to linger for 5s
# but close server side requests to avoid keeping idle connections
option httpchk HEAD /check.txt HTTP/1.0
option http-server-close
option forwardfor
server app1 x.y.149.133:80 cookie app1 weight 10 check
server app2 x.y.149.134:80 cookie app2 weight 15 check
server app3 x.y.149.135:80 cookie app3 weight 15 check
backend websocket
balance source
# options
option forwardfor # add X-Forwarded-For
# Do not use httpclose (= client and server
# connections get closed), since it will close
# Websockets connections
no option httpclose
# Use "option http-server-close" to preserve
# client persistent connections while handling
# every incoming request individually, dispatching
# them one after another to servers, in HTTP close mode
option http-server-close
option forceclose
server app1 x.y.149.133:3000 cookie app1 weight 10 check
server app2 x.y.149.134:3000 cookie app2 weight 15 check
server app3 x.y.149.135:3000 cookie app3 weight 15 check
What generally makes websockets different from your everyday http loadbalancing is the fact that that you end up with a high concurrent amount of connections as compared to the arrival rate. This is an important distinction in systems, so if it isn't clear to you have a look at this answer of mine.
So, whatever your problem is, my guess is that it occurs when you reach a certain threshold of concurrent connections. Here is my best guess based on the information you provided:
Backend web sockets contains 3 servers. The load balancer talks to them all from the same IP. That means you have a total of source_port_range * destination IPs. This looks something like:
So when you hit somewhere around 84k connections, you haproxy instances is starved for source ports, CPU spikes as it does something akin to garbage collection to find more source ports.
If this isn't, I bet it is something up this alley, monitor your concurrent connections using the haproxy stats page and monitor your cpu to better understand what is happening when things get slow.