Recently we had an apache server which was responding very slowly due to SYN flooding. The workaround for this was to enable tcp_syncookies (net.ipv4.tcp_syncookies=1 in /etc/sysctl.conf
).
I posted a question about this here if you want more background.
After enabling syncookies we started seeing the following message in /var/log/messages approximately every 60 seconds:
[84440.731929] possible SYN flooding on port 80. Sending cookies.
Vinko Vrsalovic informed me that this means the syn backlog is getting full, so I raised tcp_max_syn_backlog to 4096. At some point I also lowered tcp_synack_retries to 3 (down from the default of 5) by issuing sysctl -w net.ipv4.tcp_synack_retries=3
. After doing this, the frequency seemed to drop, with the interval of the messages varying between roughly 60 and 180 seconds.
Next I issued sysctl -w net.ipv4.tcp_max_syn_backlog=65536
, but am still getting the message in the log.
Throughout all this I've been watching the number of connections in SYN_RECV state (by running watch --interval=5 'netstat -tuna |grep "SYN_RECV"|wc -l'
), and it never goes higher than about 240, much much lower than the size of the backlog. Yet I have a Red Hat server which hovers around 512 (limit on this server is the default of 1024).
Are there any other tcp settings which would limit the size of the backlog or am I barking up the wrong tree? Should the number of SYN_RECV connections in netstat -tuna
correlate to the size of the backlog?
Update
As best I can tell I'm dealing with legitimate connections here, netstat -tuna|wc -l
hovers around 5000. I've been researching this today and found this post from a last.fm employee, which has been rather useful.
I've also discovered that the tcp_max_syn_backlog has no effect when syncookies are enabled (as per this link)
So as a next step I set the following in sysctl.conf:
net.ipv4.tcp_syn_retries = 3
# default=5
net.ipv4.tcp_synack_retries = 3
# default=5
net.ipv4.tcp_max_syn_backlog = 65536
# default=1024
net.core.wmem_max = 8388608
# default=124928
net.core.rmem_max = 8388608
# default=131071
net.core.somaxconn = 512
# default = 128
net.core.optmem_max = 81920
# default = 20480
I then setup my response time test, ran sysctl -p
and disabled syncookies by sysctl -w net.ipv4.tcp_syncookies=0
.
After doing this the number of connections in the SYN_RECV state still remained around 220-250, but connections were starting to delay again. Once I noticed these delays I re-enabled syncookies and the delays stopped.
I believe what I was seeing was still an improvement from the initial state, however some requests were still delayed which is much worse than having syncookies enabled. So it looks like I'm stuck with them enabled until we can get some more servers online to cope with the load. Even then, I'm not sure I see a valid reason to disable them again as they're only sent (apparently) when the server's buffers get full.
But the syn backlog doesn't appear to be full with only ~250 connections in the SYN_RECV state! Is it possible that the SYN flooding message is a red herring and it's something other than the syn_backlog that's filling up?
If anyone has any other tuning options I haven't tried yet I'd be more than happy to try them out, but I'm starting to wonder if the syn_backlog setting isn't being applied properly for some reason.
So, this is a neat question.
Initially, I was surprised that you saw any connections in SYN_RECV state with SYN cookies enabled. The beauty of SYN cookies is that you can statelessly participate in the in TCP 3-way handshake as a server using cryptography, so I would expect the server not to represent half-open connections at all because that would be the very same state that isn't being kept.
In fact, a quick peek at the source (tcp_ipv4.c) shows interesting information about how the kernel implements SYN cookies. Essentially, despite turning them on, the kernel behaves as it would normally until its queue of pending connections is full. This explains your existing list of connections in SYN_RECV state.
Only when the queue of pending connections is full, AND another SYN packet (connection attempt) is received, AND it has been more than a minute since the last warning message, does the kernel send the warning message you have seen ("sending cookies"). SYN cookies are sent even when the warning message isn't; the warning message is just to give you a heads up that the issue hasn't gone away.
Put another way, if you turn off SYN cookies, the message will go away. That is only going to work out for you if you are no longer being SYN flooded.
To address some of the other things you've done:
net.ipv4.tcp_synack_retries
:net.ipv4.tcp_syn_retries
: Changing this cannot have any effect on inbound connections (it only affects outbound connections)The other variables you mention I haven't researched, but I suspect the answers to your question are pretty much right here.
If you aren't being SYN flooded and the machine is responsive to non-HTTP connections (e.g. SSH) I think there is probably a network problem, and you should have a network engineer help you look at it. If the machine is generally unresponsive even when you aren't being SYN flooded, it sounds like a serious load problem if it affects the creation of TCP connections (pretty low level and resource non-intensive)
I've faced into exactly the same problem on a fresh install of Ubuntu Oneiric 11.10 running a webserver (apache2) with a heavy loaded website. On Ubuntu Oneiric 11.10 syncookies were enabled by default.
I had the same kernel messages stating a possible SYN flood attack on the webserver port:
At the same time, i was pretty sure, that there was no attack happening. I had this messages returning at 5min interval. This seemed just like a load peek, because an attacker would keep the load high all the time, while trying to get the server stop responding to requests.
Tuning the
net.ipv4.tcp_max_syn_backlog
parameter did not lead to any improvement - the messages continued at the same rate. the fact that the number of SYN_RECV connections was always really low (in my case under 250) was an indicator, that there must be some other parameter, that is responsible for this message.I have found this bug-message https://bugzilla.redhat.com/show_bug.cgi?id=734991 on the red hat site stating that the kernel message could be as a result of a bug (or misconfiguration) on the application side. Of course the log message is very misleading! As this is not the kernel parameter that is responsible in that case, but the parameter of your application, beeing passed to the kernel.
So we should also take a look at the configuration parameters of our webserver application. Grab apache docs and go to http://httpd.apache.org/docs/2.0/mod/mpm_common.html#listenbacklog
The default value of
ListenBacklog
parameter is 511. (This corresponds with the number of connections, that you have observed on your red hat server. Your another server may possibly have a lower number configured.)Apache has an own configuration parameter for the backlog queue for incoming connections. if you have a lot of incoming connections, and at any moment (just as a random thing) they arrive all together at nearly the same time, such that the webserver is not able to serve them fast enough in an appropriate way, your backlog will be full with 511 connections and kernel will fire the above message stating a possible SYN flood attack.
To solve this, i add the following line to
/etc/apache2/ports.conf
or one of the other .conf files, that will be loaded by apache (/etc/apache2/apache2.conf
should be also ok):you should also set the
net.ipv4.tcp_max_syn_backlog
to a reasonable value. in my understanding, the kernel maximum will limit the value, that you will be able to configure in the apache configuration. so run:After tuning the config, do not forget to restart your apache:
In my case, this configuration change immediately stopped the kernel warnings. I'm able to reproduce the messages by setting a low ListenBackLog value in the apache config.
After some tests with kernel 3.4.9 the number of SYN_RECV connections in netstat depends on
/proc/sys/net/core/somaxconn
rounded up to the next power of 2 (e.g. 128 -> 256)/proc/sys/net/ipv4/tcp_max_syn_backlog
if/proc/sys/net/ipv4/tcp_syncookies
is set to0
or 100% if/proc/sys/net/ipv4/tcp_syncookies
is set to1
ListenBackLog
in the apache config rounded up to the next power of 2 (e.g. 128 -> 256)the minimum of each of this parameters is used. After changing somaxconn or ListenBackLog apache has to be restarted.
And after increasing tcp_max_syn_backlog apache has also to be restarted.
Without tcp_syncookies apache is blocking, why in this case only 75% of tcp_max_syn_backlog is the limit is strange. and increasing this paramter increases the SYN_RECV connections to 100% of the old value without restarting apache.