We had a little failover problem with one of our HAProxy VMs today. When we dug into it, we found this:
Jan 26 07:41:45 haproxy2 kernel: [226818.070059] __ratelimit: 10 callbacks suppressed Jan 26 07:41:45 haproxy2 kernel: [226818.070064] Out of socket memory Jan 26 07:41:47 haproxy2 kernel: [226819.560048] Out of socket memory Jan 26 07:41:49 haproxy2 kernel: [226822.030044] Out of socket memory
Which, per this link, apparently has to do with low default settings for net.ipv4.tcp_mem
. So we increased them by 4x from their defaults (this is Ubuntu Server, not sure if the Linux flavor matters):
current values are: 45984 61312 91968 new values are: 183936 245248 367872
After that, we started seeing a bizarre error message:
Jan 26 08:18:49 haproxy1 kernel: [ 2291.579726] Route hash chain too long! Jan 26 08:18:49 haproxy1 kernel: [ 2291.579732] Adjust your secret_interval!
Shh.. it's a secret!!
This apparently has to do with /proc/sys/net/ipv4/route/secret_interval
which defaults to 600 and controls periodic flushing of the route cache
The
secret_interval
instructs the kernel how often to blow away ALL route hash entries regardless of how new/old they are. In our environment this is generally bad. The CPU will be busy rebuilding thousands of entries per second every time the cache is cleared. However we set this to run once a day to keep memory leaks at bay (though we've never had one).
While we are happy to reduce this, it seems odd to recommend dropping the entire route cache at regular intervals, rather than simply pushing old values out of the route cache faster.
After some investigation, we found /proc/sys/net/ipv4/route/gc_elasticity
which seems to be a better option for keeping the route table size in check:
gc_elasticity
can best be described as the average bucket depth the kernel will accept before it starts expiring route hash entries. This will help maintain the upper limit of active routes.
We adjusted elasticity from 8 to 4, in the hopes of the route cache pruning itself more aggressively. The secret_interval
does not feel correct to us. But there are a bunch of settings and it's unclear which are really the right way to go here.
- /proc/sys/net/ipv4/route/gc_elasticity (8)
- /proc/sys/net/ipv4/route/gc_interval (60)
- /proc/sys/net/ipv4/route/gc_min_interval (0)
- /proc/sys/net/ipv4/route/gc_timeout (300)
- /proc/sys/net/ipv4/route/secret_interval (600)
- /proc/sys/net/ipv4/route/gc_thresh (?)
- rhash_entries (kernel parameter, default unknown?)
We don't want to make the Linux routing worse, so we're kind of afraid to mess with some of these settings.
Can anyone advise which routing parameters are best to tune, for a high traffic HAProxy instance?
I never ever encountered this issue. However, you should probably increase your hash table width in order to reduce its depth. Using "dmesg", you'll see how many entries you currently have:
You can change this value with the kernel boot command line parameter
rhash_entries
. First try it by hand then add it to yourlilo.conf
orgrub.conf
.For example:
kernel vmlinux rhash_entries=131072
It is possible that you have a very limited hash table because you have assigned little memory to your HAProxy VM (the route hash size is adjusted depending on total RAM).
Concerning
tcp_mem
, be careful. Your initial settings make me think you were running with 1 GB of RAM, 1/3 of which could be allocated to TCP sockets. Now you've allocated 367872 * 4096 bytes = 1.5 GB of RAM to TCP sockets. You should be very careful not to run out of memory. A rule of thumb is to allocate 1/3 of the memory to HAProxy and another 1/3 to the TCP stack and the last 1/3 to the rest of the system.I suspect that your "out of socket memory" message comes from default settings in
tcp_rmem
andtcp_wmem
. By default you have 64 kB allocated on output for each socket and 87 kB on input. This means a total of 300 kB for a proxied connection, just for socket buffers. Add to that 16 or 32 kB for HAProxy, and you see that with 1 GB of RAM you'll only support 3000 connections.By changing the default settings of
tcp_rmem
andtcp_wmem
(middle param), you can get a lot lower on memory. I get good results with values as low as 4096 for the write buffer, and 7300 or 16060 intcp_rmem
(5 or 11 TCP segments). You can change those settings without restarting, however they will only apply to new connections.If you prefer not to touch your sysctls too much, the latest HAProxy, 1.4-dev8, allows you to tweak those parameters from the global configuration, and per side (client or server).
I am hoping this helps!
The
Out of socket memory error
is often misleading. Most of the time, on Internet facing servers, it does not indicate any problem related to running out of memory. As I explained in far greater details in a blog post, the most common reason is the number of orphan sockets. An orphan socket is a socket that isn't associated to a file descriptor. In certain circumstances, the kernel will issue theOut of socket memory error
even though you're 2x or 4x away from the limit (/proc/sys/net/ipv4/tcp_max_orphans
). This happens frequently in Internet-facing services and is perfectly normal. The right course of action in this case is to tune uptcp_max_orphans
to be at least 4x the number of orphans you normally see with your peak traffic.Do not listen to any advice that recommends tuning
tcp_mem
ortcp_rmem
ortcp_wmem
unless you really know what you're doing. Those giving out these advices typically don't. Their voodoo is often wrong or inappropriate for your environment and will not solve your problem. It might even make it worse.We tune some of these parameters regularly. Our standard for high throughput, low latency trading platforms is: