I have a small farm of web servers (HP Proliant and IBM x, with Broadcom Corporation NetXtreme II BCM5 NIC's) running Apache 2.2.15 on CentOS 6, behind a Cisco ACE load balancer, serving a PHP/JS based web portal. This farm receives a lot of requests daily (it serves a whole small country) trying to access a splash page (to go, from there, to the index page)
I've been struggling with the following problem:
I've noticed sometimes requests to web delay quite a "long" time to be answered (from the client point of view) and sometimes they are not even answered at all (timeout at web client side). In the latter, I don't even seen the request on Apache logs.
I've also noticed that netstat reports an increasing amount of TCP resets being sent (
netstat -st | grep 'resets sent'
)Also,
dropwatch -l kas
shows there are many packets being dropped:
Initalizing kallsyms db dropwatch> start Enabling monitoring... Kernel monitoring activated. Issue Ctrl-C to stop monitoring 53 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 26 drops at tcp_rcv_established+926 (0xffffffff814981b6) 3 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 1 drops at netlink_unicast+251 (0xffffffff81471b11) 56 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 29 drops at tcp_rcv_established+926 (0xffffffff814981b6) 4 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 51 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 32 drops at tcp_rcv_established+926 (0xffffffff814981b6) 2 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 1 drops at ip_rcv_finish+199 (0xffffffff8147ea49) 1 drops at tcp_v4_destroy_sock+115 (0xffffffff814a0cf5) 1 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 22 drops at tcp_rcv_established+926 (0xffffffff814981b6) 36 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 2 drops at tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 49 drops at tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 29 drops at tcp_rcv_established+926 (0xffffffff814981b6) 26 drops at tcp_rcv_established+926 (0xffffffff814981b6)
I've been following recommendations from RH (Red Hat Enterprise Linux Network Performance Tuning Guide), even though I've not seen some of the symptoms described there in my servers. In short:
- I've increased the NIC ring buffers to maximum.
- I've fiddled with (increased or changed) several kernel parameters (tcp_syncookies, netdev_budget, tcp_timestamps, tcp_window_scaling, tcp_rmem, dev_weight, tcp_tw_reuse...)
- I've modified the Apache config according to several "Apache optimization guides" extracted from web (even tough there were, and still are, Idle workers on Apache stats)
- I've stop/disabled any system service/daemon not required (basically all that remains is sshd, httpd and snmpd)
All of the above with no luck.
All NIC's at working at Speed: 1000Mb/s, CPU and disk usage are low, and neither netstat
nor ethtool
shows errors.
Any ideas what else can be done?
A TCP reset is an immediate close of a TCP connection. This allows for the resources that were allocated for the previous connection to be released and made available to the system.
causes of RST generation
Ack, Reset
sent in response to a Syn. An Ack Reset sent in response to a Syn frame is sent to acknowledge the receipt of the frame but then to let the client know that the server cannot allow the connection on that port. Among the reasons for the Ack, Reset are:
a. The node being connected to is not listening on the port the client node is trying to connect to.
b. There is some reason that the server node cannot complete the connection on that port. For example, the server is out of resources and so cannot allocate the needed resources to allow the connection.
RST
If the connection is in any non-synchronized state (LISTEN, SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges something not yet sent (the segment carries an unacceptable ACK) , a reset is sent.
The next reset is a TCP reset that happens when a network frame is sent six times (this would be the original frame plus five retransmits of the frame) without a response. As a result, the sending node resets the connection.
As you and tried using various kernal tuning parameters , Try using tcp cookies option of kernel
Enable TCP SYN cookie protection
solution can be given only by analyzing your logs , IPtables can also help