I have two Dell R410 web servers (2x quad core Xeon E5520 w/ 8gb ram) running Debian 5 stable. Their patching had been neglected for a while, so recently we did a patching run to bring everything up to date - neccessitated by a new version of the application it runs which requires PHP 5.3.6. The kernel wasn't updated because it came from the Debian backports repository (the installed version is 2.6.30-bpo.1-amd64).
Since the patching, users have complained that the web site is slow. The majority of requests are served instantly, but now and again it'll get "stuck" on a request. There doesn't seem to be any discernible pattern in the requests that get stuck.
These servers are behind a load balancer, they were updated at the same time and both started exhibiting this issue at the time of the patching run. They were not rebooted at the time, but have been since with no effect.
I setup a script on the servers themselves to loop over time curl localhost:80/alive
, which has a simple index.html file in it containing only "OK". Strangely these requests still get delayed with the same frequency and duration as requests for actual php content. The common times are 3 seconds, 9s, 25s 45s and some are over 3 minutes. 45 seconds is a common response time but of course browsers give up well before this so it's effectively no response.
The apache worker config is as follows:
<IfModule mpm_prefork_module>
StartServers 50
MinSpareServers 10
MaxSpareServers 150
ServerLimit 500
MaxClients 500
MaxRequestsPerChild 5000
</IfModule>
It seems sensible to me for a server with 8gb of ram. In practice the worker count seldom goes over 170 so we're not hitting that limit and there is plenty of free memory. Load averages are low, they hover around 0.5-1.5
The kernel is an old backport so I tried updating it to the latest backport for lenny (2.6.32-bpo.5-amd64), but it panicked on boot and I had to get our host to restart it with the old one, so I'd like to explore other options before we try updating their bioses and formatting them with Debian 6.
Apache seems to be a likely culprit, so the next step is to update to the latest apache backport, but the version was a fairly minor bump from 2.2.9-10+lenny4 to 2.2.9-10+lenny9, so I wasn't expecting any significant changes.
PHP is installed, version 5.3.6 from dotdeb. Previous version was 5.3.0 custom compiled. In addition, my boss has just informed me that requests over https do not get delayed but I have not confirmed this myself.
# apache2 -V
Server version: Apache/2.2.9 (Debian)
Server built: Dec 11 2010 21:34:00
Server's Module Magic Number: 20051115:15
Server loaded: APR 1.2.12, APR-Util 1.2.12
Compiled using: APR 1.2.12, APR-Util 1.2.12
Architecture: 64-bit
Server MPM: Prefork
threaded: no
forked: yes (variable process count)
Server compiled with....
-D APACHE_MPM_DIR="server/mpm/prefork"
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
-D APR_USE_SYSVSEM_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D DYNAMIC_MODULE_LIMIT=128
-D HTTPD_ROOT=""
-D SUEXEC_BIN="/usr/lib/apache2/suexec"
-D DEFAULT_PIDLOG="/var/run/apache2.pid"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_LOCKFILE="/var/run/apache2/accept.lock"
-D DEFAULT_ERRORLOG="logs/error_log"
-D AP_TYPES_CONFIG_FILE="/etc/apache2/mime.types"
-D SERVER_CONFIG_FILE="/etc/apache2/apache2.conf"
# apache2ctl -t -D DUMP_MODULES
Loaded Modules:
core_module (static)
log_config_module (static)
logio_module (static)
mpm_prefork_module (static)
http_module (static)
so_module (static)
alias_module (shared)
auth_basic_module (shared)
authn_file_module (shared)
authz_default_module (shared)
authz_groupfile_module (shared)
authz_host_module (shared)
authz_user_module (shared)
autoindex_module (shared)
cgi_module (shared)
deflate_module (shared)
dir_module (shared)
env_module (shared)
geoip_module (shared)
mime_module (shared)
negotiation_module (shared)
php5_module (shared)
rewrite_module (shared)
setenvif_module (shared)
ssl_module (shared)
status_module (shared)
Syntax OK
Any assistance greatly appreciated!
Been banging my against the wall on this for a week now, and my boss has fixed it.
Once we looked at Apache's response times in the logs we saw that it was responding quickly - the delays were happening before the request even reached Apache. Thus he looked at the tcp stack settings, comparing them to another server running Red Hat 5.6.
To cut a long story short, enabling tcp syn cookies (
net.ipv4.tcp_syncookies=1
in /etc/sysctl.conf) has fixed the problem. This setting is designed to protect against SYN floods and apparently does allow faster responses. It's possible we're getting flooded accidentally (or deliberately).More info is in this link, the symptoms described are exactly what we were seeing: http://baheyeldin.com/technology/linux/detecting-and-preventing-syn-flood-attacks-web-servers-running-linux.html
I was looking at
netstat -alnt
and the vast majority of connections were in state TIME_WAIT, not SYN_RECV (maybe the -l option doesn't show half-open connections).However we are now seeing this in dmesg frequently:
I shall do some more digging.