Ping a Specific Port

Question

user25417

Asked: 2011-01-04 05:43:54 +0800 CST2011-01-04 05:43:54 +0800 CST 2011-01-04 05:43:54 +0800 CST

Strange 3-second tcp connection latencies (Linux, HTTP)

772

Our webservers with static content are experiencing strange 3 second latencies occasionally. Typically, an ApacheBench run (> 10000 requests, concurrency 1 or 40, no difference, but keepalive off) looks like this:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2   10 152.8      3    3015
Processing:     2    8  34.7      3     663
Waiting:        2    8  34.7      3     663
Total:          4   19 157.2      6    3222

Percentage of the requests served within a certain time (ms)
  50%      6
  66%      7
  75%      7
  80%      7
  90%      9
  95%     11
  98%    223
  99%    225
 100%   3222 (longest request)

I have tried many things: - Apache2 2.2.9 with worker or prefork MPM, no difference (with KeepAliveTimeout 10-15) - Nginx 0.6.32 - various tcp parameters (net.core.somaxconn=3000, net.ipv4.tcp_sack=0, net.ipv4.tcp_dsack=0) - putting the files/DocumentRoot on tmpfs - shorewall on or off (i.e. empty iptables or not) - AllowOverride None is on for /, so no .htaccess checks (verified with strace) - the problem persists whether the webservers are accessed directly or through a Foundry load balancer

Kernel is 2.6.32 (Debian Lenny backports), but it occurred with 2.6.26 also. IPv6 is enabled, but not used.

Does the issue look familiar to anyone? Help/suggestions are much appreciated. It sounds a bit like a SYN,ACK packet getting lost or ignored.

4 Answers

Voted

Marcin · Answer 1 · 2011-01-04T06:39:51+08:00

Marcin

2011-01-04T06:39:51+08:002011-01-04T06:39:51+08:00

Capture this event with tcpdump/Wireshark/tshark. Then open the capture in Wireshark, go to Statistics->TCP stream graph->Time-sequence graph (Stevens).

This gets you a graph of sequence numbers vs time. If you have a 3 second gap in your connections, you should be able to spot it, as there should be no dots for the 3 seconds on the x-axis in between two dense groupings of dots. Click on the last dot on the left side of the gap. This takes you to the frame just before the gap happens. Usually that's the one packet containing the problem. You might see zero-window packet, packet missing, out of order delivery, dups, etc...

3

Paul Tomblin · Answer 2 · 2011-01-04T05:53:10+08:00

Paul Tomblin

2011-01-04T05:53:10+08:002011-01-04T05:53:10+08:00

Check if your DNS server is slow, and set your Apache log files so that they log by IP not by domain name. If you don't change the default log file setting, every time you get a request, the logger has to do a DNS lookup.

2

pehrs · Answer 3 · 2011-01-04T06:44:15+08:00

This can be caused by IO locks in many interesting ways. To start with, try to isolate the problem. Is the problem the server/network, or is it the service? Can you replicate the problem with ping/tcpping?

If it's a problem where the whole server hangs for a few seconds.

Are your hard-disks set to spin down on inactivity? If you get a page-fault on a HD that is spun down the system can take seconds to recover. Either way, consider getting rid of swap.
It can be a low level problem with the network. I have seen similar behaviour with rare, slow, connections when a Switch ran out of space in the MAC address table. Do some packet traces and see if you can see something else that seems related on the network.
It can also be a HW problem with the server, such as a bus that locks up and recovers after a few seconds. Check your logs.

If seems to only be the Apache:

DNS lookups would be a common culprit, but you seem to have that one covered.
Try rolling out a completely different server (like lighttp) and see if that gets you around the problem. Then you can start suspecting something in your apache configuration.

rnbrady · Answer 4 · 2012-01-03T10:25:13+08:00

rnbrady

2012-01-03T10:25:13+08:002012-01-03T10:25:13+08:00

Sounds like a problem with TCP connection establishment, i.e. a lost SYN,ACK just as you suggest.

3 seconds is the default first timeout for TCP SYN,ACK on Linux. It is unlikely to be application (webserver) related as connection establishment is handled by the kernel.

Since it affects less than 1% of connections, some things it could be are:

packet loss if on a WAN (1% packet loss not unheard over some WAN types),
misconfigured NIC (use ethtool to investigate and confirm duplexing, autoneg, etc),
a cable fault (can't hurt to try swapping out the cable),
a kernel bug (which you seem to have eliminated).

I had this recently on a server and it turned out to be the second one above: misconfigured NIC, which had been forced to the wrong speed and duplex settings. I reset it to autonegotiate with ethtool and haven't looked back.

0

Strange 3-second tcp connection latencies (Linux, HTTP)

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?