Our server occasionally refuses to serve a simple HTML page.
This is happening during a relatively high number of requests. However, the processor is not heavy loaded and there are a lot of free memory. The error seems to occure 1 out of 50 requests in average, depending on the server load.
I need to find the source of the problem and take the appropriate actions to eliminate it.
I have a suspicion that the problem source is a huge number of incoming network packets. There are 5000 packets per second on average. Traffic - 2 MBits/sec Can this be the cause of the error?
There is an interesting thing, in case the server fails to respond, the request string is not logged to access.log by Apache.
The error is repeatable from several client computers. DNS is not involved, since I have accessed the server by the IP.
I have profiled the problem case with tcpdump utility. These are the good and bad sessions traced by tcpdump. The request is the same in both experiments. Good - server returns response. Bad - no response, time-out error.
---- Bad ----
12:23:36.366292 IP 123.45.67.890.61749 > myserver.superbservers.com.www: S 2125316338:2125316338(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
12:23:39.362394 IP 123.45.67.890.61749 > myserver.superbservers.com.www: S 2125316338:2125316338(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
12:23:45.365567 IP 123.45.67.890.61749 > myserver.superbservers.com.www: S 2125316338:2125316338(0) win 8192 <mss 1460,nop,nop,sackOK>
--------
---- Good ----
12:27:07.632229 IP 123.45.67.890.63914 > myserver.superbservers.com.www: S 3581365570:3581365570(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
12:27:10.620946 IP 123.45.67.890.63914 > myserver.superbservers.com.www: S 3581365570:3581365570(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
12:27:10.620969 IP myserver.superbservers.com.www > 123.45.67.890.63914: S 2654770980:2654770980(0) ack 3581365571 win 5840 <mss 1460,nop,nop,sackOK,nop,wscale 6>
12:27:10.838747 IP 123.45.67.890.63914 > myserver.superbservers.com.www: . ack 1 win 4380
12:27:10.957143 IP 123.45.67.890.63914 > myserver.superbservers.com.www: P 1:213(212) ack 1 win 4380
12:27:10.957152 IP myserver.superbservers.com.www > 123.45.67.890.63914: . ack 213 win 108
12:27:10.965543 IP myserver.superbservers.com.www > 123.45.67.890.63914: P 1:630(629) ack 213 win 108
12:27:10.965621 IP myserver.superbservers.com.www > 123.45.67.890.63914: F 630:630(0) ack 213 win 108
12:27:11.183540 IP 123.45.67.890.63914 > myserver.superbservers.com.www: . ack 631 win 4222
12:27:11.185657 IP 123.45.67.890.63914 > myserver.superbservers.com.www: F 213:213(0) ack 631 win 4222
12:27:11.185663 IP myserver.superbservers.com.www > 123.45.67.890.63914: . ack 214 win 108
--------
Hoster: SuperbHosting
OS: Ubuntu
Server parameters: E6300 CONROE 1.86GHZ 2 X 1MB CACHE 1066 1GB DDR2 667MHZ
This is a link to apache configuration file we use http://repkin5.snow.prohosting.com/apache.txt
This is server-status report taken right after time-out error. http://repkin5.snow.prohosting.com/server-status.htm There are only 10 Child Servers running out of 120, so enough space for new requests.
VMSTAT
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 8900 725900 8468 65684 0 0 5 18 11 33 4 3 92 1
This sounds like a network problem. The server should be logging any requests it receives even if it can't answer for some reason. You may want to verify that you aren't seeing packet loss on the web server.
There's a small chance that you're in a position where the available kernel buffers for TCP connections are low. I would expect some logging from that (log in to the server, test until you've had a "no response", then run
dmesg
and see if anything looks applicable).To tune the network setup, this may be a starting point.
As Chris Nava said, it's probably worth making sure you're not just having packet loss across the network, so by all means start checking using ping (responding to a ping is, alas, not at all the same as dealing with a TCP packet).