I have an Ubuntu 10.10 server with plenty of RAM, bandwidth and CPU. I'm seeing a strange, repeatable pattern in the distribution of latencies when serving static files from both Apache and nginx. Because the problem is common to both http servers, I'm wondering if I have misconfigured or poorly tuned Ubuntu's networking or cache parameters.
ab -n 1000 -c 4 http://apache-host/static-file.jpg
:
Percentage of the requests served within a certain time (ms)
50% 5
66% 3007
75% 3009
80% 3011
90% 9021
95% 9032
98% 21068
99% 45105
100% 45105 (longest request)
ab -n 1000 -c 4 http://nginx-host/static-file.jpg
:
Percentage of the requests served within a certain time (ms)
50% 19
66% 19
75% 3011
80% 3017
90% 9021
95% 12026
98% 12028
99% 18063
100% 18063 (longest request)
The results consistently follow this kind of pattern - 50% or more of requests served as expected, then the remainder falling into discrete bands, with the slowest a few orders of magnitude slower.
Apache is 2.x and has mod_php installed. nginx is 1.0.x and has Passenger installed (but neither app server should be in the critical path for a static file). Load average was around 1 when each test was run (server has 12 physical cores). 5GB free ram, 7GB cached swap. Tests were run from localhost.
Here are the configuration changes I have made from Ubuntu server 10.10 defaults:
/etc/sysctl.conf:
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_mem = 16777216 16777216 16777216
net.ipv4.tcp_window_scaling = 1
net.ipv4.route.flush = 1
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.core.somaxconn = 8192
/etc/security/limits.conf:
* hard nofile 65535
* soft nofile 65535
root hard nofile 65535
root soft nofile 65535
other config:
ifconfig eth0 txqueuelen 1000
Please let me know if this kind of problem rings any bells, or if more information about the config would be helpful. Thanks for your time.
Update: Here's what I'm seeing after increasing net.netfilter.nf_conntrack_max
as suggested below:
Percentage of the requests served within a certain time (ms)
50% 2
66% 2
75% 2
80% 2
90% 3
95% 3
98% 3
99% 3
100% 5 (longest request)