We have a number of AWS EC2 instances within the same AZ that transmit large amounts of network traffic to each other. In a small fraction of the connections, when a client on host A connects to a server on host B and sends a large amount of data (e.g. 20 GB) from A to B at a high rate, the TCP connection freezes or times out. I've investigated this and the symptoms aren't always the same, but typically it appears that when a connection is impacted by this problem the sender (on host A) stops receiving the ACKs that the receiving side (host B) is sending to A after some time. So at first all ACKs pass through, and then they get blocked in the middle of the connection. Also, VPC Flow Logs show that some packets returning from host B (the receiver) to host A (the sender) are rejected.
This happens on a number of EC2 instances (typically r5a.xlarge) that run Debian Linux 10 with Linux kernel 5.3.9 and the ENA AWS network driver that's shipped as part of Debian's kernel. They run Docker 18.09.1, installed via the docker.io Debian Buster package. Interestingly, I'm not able to reproduce the issue on Amazon Linux 2 (with Docker installed).
I've been able to reproduce it by letting the following simple experiment run in a loop for some time:
# Host B (server receiving data)
docker run -it --rm -p 20098:20098 debian:buster bash
apt-get update && apt-get -y install netcat-openbsd
while true; do date; nc -l -p 20098 | dd of=/dev/null bs=1M; done
# Host A (client sending data)
docker run -it --rm debian:buster bash
apt-get update && apt-get -y install netcat-openbsd
while sleep 1; do date; dd if=/dev/zero bs=1M count=20480 | nc -q 1 <server> 20098; done
The vast majority of times the experiment will succeed in sending 20 GB over the wire, but every once in a while (sometimes within minutes, sometimes within a few hours or even days) the transfer will get stuck or get cut short due to an unexpected disconnect/timeout. On some hosts I can reproduce the problem a lot more easily than on other hosts. The hosts where I can reproduce this more quickly tend to have more Docker containers and network activity, but I'm not sure yet if there's a causal relationship there. I was also able to reproduce the issue directly when running the above netcat experiment directly the host rather than within a Docker container, although it does seem a lot harder to reproduce this way. This happens on hosts within the same VPC, AZ, and even subnet so we can rule out cross-region/cross-AZ/cross-subnet connectivity issues as a cause.
Here's example tcpdump output that shows network activity when this happens. I'm skipping many successfully transmitted and ACK'ed TCP packets within this same connection. This information was captured with tcpdump -i eth0 -p -G 600 -s 80 -w ... host ... and port 20098
. This is captured on the host's network interface, not inside the Docker network, so network address translations have already been applied.
Tcpdump output on host A (172.20.3.188, the sending client):
08:00:03.615061 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322435576:322444525, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615064 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322444525:322453474, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615066 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615069 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322462423:322471372, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615071 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322471372:322480321, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615073 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322480321:322489270, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615076 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322489270:322498219, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615140 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322435576, win 256, options [nop,nop,TS val 683441101 ecr 4223113896], length 0
08:00:03.615178 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322453474, win 117, options [nop,nop,TS val 683441101 ecr 4223113896], length 0
08:00:03.824740 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223114105 ecr 683441101], length 8949
08:00:04.256748 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223114537 ecr 683441101], length 8949
08:00:05.084733 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223115365 ecr 683441101], length 8949
08:00:06.748724 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223117029 ecr 683441101], length 8949
08:00:10.108720 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223120389 ecr 683441101], length 8949
08:00:16.764722 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223127045 ecr 683441101], length 8949
08:00:30.076723 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223140357 ecr 683441101], length 8949
08:00:57.724718 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223168005 ecr 683441101], length 8949
08:01:50.972736 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223221253 ecr 683441101], length 8949
08:03:37.468722 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223327749 ecr 683441101], length 8949
08:05:38.304715 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223448585 ecr 683441101], length 8949
08:07:39.132913 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223569414 ecr 683441101], length 8949
Tcpdump output on host B (172.20.3.89, the receiving server):
08:00:03.615206 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322435576:322453474, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 17898
08:00:03.615225 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322498219, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 44745
08:00:03.615228 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322453474, win 117, options [nop,nop,TS val 683441101 ecr 4223113896], length 0
08:00:03.615256 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 0, options [nop,nop,TS val 683441101 ecr 4223113896], length 0
08:00:03.615908 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 1642, options [nop,nop,TS val 683441102 ecr 4223113896], length 0
08:00:03.616389 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 3373, options [nop,nop,TS val 683441102 ecr 4223113896], length 0
08:00:03.618742 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 6862, options [nop,nop,TS val 683441105 ecr 4223113896], length 0
08:00:03.621737 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 13913, options [nop,nop,TS val 683441108 ecr 4223113896], length 0
08:00:03.824879 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223114105 ecr 683441101], length 8949
08:00:03.824905 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683441311 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:04.256895 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223114537 ecr 683441101], length 8949
08:00:04.256929 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683441743 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:05.084873 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223115365 ecr 683441101], length 8949
08:00:05.084908 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683442571 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:06.748872 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223117029 ecr 683441101], length 8949
08:00:06.748901 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683444235 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:10.108863 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223120389 ecr 683441101], length 8949
08:00:10.108889 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683447595 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:16.764877 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223127045 ecr 683441101], length 8949
08:00:16.764905 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683454251 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:30.076864 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223140357 ecr 683441101], length 8949
08:00:30.076881 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683467563 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:57.724863 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223168005 ecr 683441101], length 8949
08:00:57.724877 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683495211 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:01:50.972908 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223221253 ecr 683441101], length 8949
08:01:50.972922 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683548459 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:03:37.468882 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223327749 ecr 683441101], length 8949
08:03:37.468902 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683654955 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:05:38.304895 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223448585 ecr 683441101], length 8949
08:05:38.304942 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683775791 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:07:39.133073 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223569414 ecr 683441101], length 8949
08:07:39.133092 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683896619 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
Notice how host A stops receiving packets from host B after it receives the 08:00:03.615178 ... ack 322453474
packet.
Here is the output of VPC Flow Logs during a failed connection (captured at a different time than the tcpdump output above):
Given that Amazon Linux 2 doesn't seem to exhibit this problem I've tried to bring the network stack on Debian a bit more closely in line with Amazon Linux. I've tried to do the following on the Debian instances:
- Apply some of the network sysctl settings from Amazon Linux to Debian
- Upgrade the Linux kernel to 5.8.10
- Upgrade the ena driver to 2.2.11
- Upgrade Docker to 19.03.13
- Explicitly allow ingress and egress traffic to/from ephemeral ports (32768-65535) to/from all IPs within our VPC in the security group that these hosts use
None of these seem to resolve the issue I'm having. What could possibly cause these dropped/rejected packets?