I'm seeing an issue with a specific combination of circumstances that causes TCP duplicate ACKs to be sent. This happens with a service hosted on Ubuntu Server 20.04.
I have a web service that, among many things, accepts uploads with multipart/form-data. When setting up a new version of the service on a new server I observed slow behavior and eventually landed on the following.
I have two computers to test from. When sending a request to it from a Windows 10 computer, some packets would be ACKed twice or ACKed out of order and then retransmitted. Here's a screenshot of Wireshark's TCP window scaling analysis when sending a 25 MB file with random bytes - the window never gets big enough to keep up throughput due to frequent (every 10-20 packets or so) duplicate ACKs or out-of-order ACKs and corresponding retransmissions. It takes around 55 seconds to transfer the file because of this.
When sending a request to it from a macOS 10.15 computer, I don't see this behavior. Here's a screenshot of Wireshark's TCP window scaling analysis - the window quickly grows and all the data is transfered in a handful of seconds the way it should be.
Using identical versions of Chrome and Firefox and also Safari on Mac and (old) Edge on Windows, the behavior is the same - all browsers on each computer are affected by the same (respective) behavior. This seems to me to indicate that there is some sort of TCP parameter tweak needing to happen.
To isolate the web stack I'm using, I wrote a minimal test program in Node.js v12 which accepts uploads with multipart/form-data, and I get identical behavior - bad behavior from Windows and good behavior from Mac. Node.js uses libuv and the other stack is ASP.NET Core using the Kestrel server which is configured with a non-libuv transport, so their networking have nothing in common above the OS's sockets.
As far as I can tell, both the good behavior and the bad behavior is maintained without modification when reverse proxied through nginx (adding TLS termination).
You might say that this is related to the network on my side. Both computers reside on the same wireless network behind the same router, and the Linux server is hosted in a different country, but I also see similar behavior when I try using computers not from within my network, or through cellular data.
As a final parameter, when hosting the test program or the web service on the Mac and sending a similar upload request from the Windows computer, I get the good, fast behavior. In other words, the Windows computer is not just throttled or riddled with packet loss compared to the Mac - it can work, but somehow when connected to the Linux server it ends up not working. (The Windows computer and Mac post similar scores in bandwidth tests.)
So, all of this seems to tell me that there's some sort of bad dice roll when Ubuntu 20.04's TCP settings (or the way they're configured at my host or affected by my host's environment) meets the Windows TCP stack's settings. What could cause these things, and what can I change on the server to hopefully force a good behavior?
(It's possible this is more of a generic Linux networking question - I'm posting it here because I'm running into it on Ubuntu Server 20.04.)