What resources (books, Web pages etc) would you recommend that:
- explain the causes of latency in TCP/IP-over-Ethernet networks;
- mention tools for looking out for things that cause latency (e.g. certain entries in
netstat -s
); - suggest ways to tweak the Linux TCP stack to reduce TCP latency (Nagle, socket buffers etc).
The closest I am aware of is this document, but it's rather brief.
Alternatively, you're welcome to answer the above questions directly.
edit To be clear, the question isn't just about "abnormal" latency, but about latency in general. Additionally, it is specifically about TCP/IP-over-Ethernet and not about other protocols (even if they have better latency characteristics.)
In regards to kernel tunables for latency, one sticks out in mind:
From the documentation:
You can also disable Nagle's algorithm in your application (which will buffer TCP output until maximum segment size) with something like:
The "opposite" of this option is
TCP_CORK
, which will "re-Nagle" packets. Beware, however, asTCP_NODELAY
might not always do what you expect, and in some cases can hurt performance. For example, if you are sending bulk data, you will want to maximize throughput per-packet, so setTCP_CORK
. If you have an application that requires immediate interactivity (or where the response is much larger than the request, negating the overhead), useTCP _NODELAY
. On another note, this behavior is Linux-specific and BSD is likely different, so caveat administrator.Make sure you do thorough testing with your application and infrastructure.
In my experience the biggest cause of abnormal latency on otherwise healthy high-speed networks are TCP Windowing (RFC1323, section 2) faults, with a closely related second in faults surrounding TCP Delayed Acks (RFC1122 section 4.2.3.2). Both of these methods are enhancements to TCP for better handling of high speed networks. When they break, speeds drop to very slow levels. Faults in these cases affect large transfers (think backup streams), where extremely transactional small traffic (average data transfer is under the MTU size and there is a LOT of back-n-forth) will be less affected by these.
Again, I've seen the biggest problems with these two issues when two different TCP/IP stacks are talking. Such as Windows/Linux, 2.4-Linux/2.6-Linux, Windows/NetWare, Linux/BSD. Like to like works very, very well. Microsoft rewrote the Windows TCP/IP stack in Server 2008 which introduced Linux interoperability problems that didn't exist with Server 2003 (I believe these are fixed, but I'm not 100% sure of that).
Disagreements on the exact method of Delayed or Selective Acknowledgments can lead to cases like this:
Throughput goes through the floor because of all of the 200ms timeouts (Windows defaults it's delayed-ack timer to 200ms). In this case, both sides of the conversation failed to handle TCP Delayed Ack.
TCP Windowing faults are harder to notice because their impact can be less obvious. In extreme cases Windowing fails completely and you get packet->ack->packet->ack->packet->ack which is really slow when transferring anything significantly larger than about 10KB and will magnify any fundamental latency on the link. The harder to detect mode is when both sides are continually renegotiating their Window size and one side (the sender) fails to respect the negotiation which requires a few packets to handle before data can continue to be passed. This kind of fault shows up in red blinking lights in Wireshark traces, but manifests as lower than expected throughput.
As I mentioned, the above tend to plague large transfers. Traffic like streaming video or backup streams can be really nailed by them, as well as simple downloading of very large files (like Linux distro ISO files). As it happens, TCP Windowing was designed as a way to work around fundamental latency problems as it allows pipelining of data; you don't have to wait for round-trip-time for each packet sent you can just send a big block and wait for a single ACK before sending more.
That said, certain network patterns don't benefit from these work-arounds. Highly transactional, small transfers, such as those generated by databases, suffer most from normal latency on the line. If the RTT is high these workloads will suffer greatly, where large streaming workloads will suffer a lot less.
There are many answers to this question.
Remember how TCP works. Client sends SYN, server answers SYN/ACK and client answers ACK. Once the server has received the ACK, it can now send data. This means that you have to wait 2 times the round trip time (RTT) to send the first bit of meaningful data. If you have 500ms of RTT, you get a 1 second delay right there from the start. If the sessions are short lived but numerous, this will create a lot of latency.
Once the session is established, the server sends data units that have to be acknowledged by the client. The server can only send so much data in the wild before it requires the acknowledgment of the first data unit. This can create latency as well. If a data unit gets dropped, you have to pick up the transmission from there and therefore create extra latency.
On the IP level, you have fragmentation (even though it is quite rare today). If you send 1501 byte frames and the other side only supports a MTU of 1500, you will be sending an extra IP packet for just that last bit of data. This can be overcome by using Jumbo frames.
The best way to increase TCP/IP throughput is to reduce latency as much as possible and avoid transmission errors as much as possible. I do not know of any kernel tweaks but I'm sure someone will.
In case of the WAN a primary factor for introducing latency is the Speed of Light. It's takes a theoretical minimum of ~36.2ms for data to cross North America.
One way trip along fiber optic cables in seconds:
Multiply times 1000 to convert from seconds to milliseconds. Double it for the Roundtrip:
Here's latency from Washington, DC to Los Angeles, CA:
More about the formula
Probably not the answer you're looking for: the leading cause of latency in a WAN is the speed of light (it's way too slow!). Also, saturated links with a big buffer along the way tend to gain impressive latency.
See the following website: http://www.29west.com/docs/THPM/index.html
TCP is an end-to-end (or client-to-client) protocol that assumes the network in the middle has very little loss. For a more robust protocol see X.25. Thus you will have most control on protocol parameters on the clients only (not the network).
An Ethernet is a Local Area Network (LAN) (although that definition has been widely extended over the last decade to include wide-area networks, too) and one would expect little transmission loss unless faced with 70% or greater traffic on a shared segment. Re-transmits would be an infrequent occurrence on the modern Ethernet network, however, given that almost all Ethernet segments are switched nowadays.
So congestion is your biggest enemy when it comes to latency on the LAN. But then you have more serious problems than mere latency.
If you are serious about latency issues for your communications protocol then you should really be considering a packet-switched as opposed to virtual-circuit protocol such as UDP or RTMP.