Between servers windows-in-Finland <-> linux-in-Germany
I am experiencing 100x slower upload than download (windows -> linux
is 100x slower than windows <- linux
).
Details and existing research
I originally observed this problem with Windows clients across the world, and noticed that I can reproduce it also across controlled datacenter environments.
For reproducing the problem, I'm using the datacenter provider Hetzner, with the Windows
machine being in Finland (dedicated server, Windows Server 2019), uploading to both of:
- Linux Hetzner dedicated Germany: slow
- Linux Hetzner Cloud VM Germany: fast
Both of them are in the same datacenter park and thus both have 37 ms
ping
from the Windows machine. While the connection between Finland and Germany is usually on Hetzner's private network, it is currently being re-routed via public Internet routes due to the C-LION1 2024 Baltic Sea submarine cable disruption (Hetzner status message about it), so the connection "simulates" using normal public Internet routes and peerings.
I'm measuring with iperf3
, windows <- linux
:
C:\Users\Administrator\Downloads\iperf3.17.1_64\iperf3.17.1_64>iperf3.exe -c linux-germany-dedicated.examle.com
Connecting to host linux-germany-dedicated.examle.com, port 5201
[ 5] local 192.0.2.1 port 62234 connected to 192.0.2.2 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 15.8 MBytes 132 Mbits/sec
[ 5] 1.00-2.00 sec 1.88 MBytes 15.7 Mbits/sec
[ 5] 2.00-3.00 sec 1.38 MBytes 11.5 Mbits/sec
[ 5] 3.00-4.00 sec 1.75 MBytes 14.7 Mbits/sec
[ 5] 4.00-5.00 sec 2.25 MBytes 18.9 Mbits/sec
[ 5] 5.00-6.00 sec 2.88 MBytes 24.1 Mbits/sec
[ 5] 6.00-7.00 sec 3.25 MBytes 27.3 Mbits/sec
[ 5] 7.00-8.00 sec 3.38 MBytes 28.3 Mbits/sec
[ 5] 8.00-9.00 sec 2.75 MBytes 23.1 Mbits/sec
[ 5] 9.00-10.00 sec 1.25 MBytes 10.5 Mbits/sec
More iperf3
observations:
- The other direction (adding
-R
toiperf3
) is much faster at ~900 Mbit/s. (Note that the Linux sides are using BBR congestion control, which likely helps that direction.) - When downloading with 30 connections (
iperf3
with-P 30
), the connection 1 Gbit/s connection is maxed out, suggesting that the problem is the upload throughput of a single TCP upload connection. - When replacing the Windows machine with a Linux one in Finland, both directions max out their 1 Gbit/s connection. This leads me to conclude that the involvement of Windows is at fault.
- Note there is a Microsoft article claiming that
iperf3
is the best for high-performance measurements on Windows. This is not relevant for this question because, it applies only to >= ~10 Gbit/s connections, and the fact that iperf3 across multiple Windows/Linux machines in the same datacenter proves that 1 Gbit/s speed is easily achievable withiperf3
in both directions.
In 2021 Dropbox released an article Boosting Dropbox upload speed and improving Windows’ TCP stack that points out Windows's incorrect (incomplete) handling of TCP retransmissions; Microsoft published Algorithmic improvements boost TCP performance on the Internet along with it.
That seems to largely explain it, and Wireguard slow but only for windows upload shows a potential solution, namely changing the number of RSS (Receive Side Scaling) queues to 1:
ethtool -L eth0 combined 1
This changes from 16
(16 threads on my dedicated Linux server) to 1, and increases the converged iperf3 upload speed from 10.5
to 330
Mbit/s.
That's nice, but it should be 1000 Mbit/s.
Especially odd: Testing windows -> linux-Hetzner-Cloud
instead of windows -> Hetzner-dedicated
, I observe perfect upload speeds:
C:\Users\Administrator\Downloads\iperf3.17.1_64\iperf3.17.1_64>iperf3.exe -c linux-germany-hcloud.example.com
Connecting to host linux-germany-hcloud.example.com, port 5201
[ 5] local 192.0.2.1 port 55615 connected to 192.0.2.3 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 108 MBytes 903 Mbits/sec
[ 5] 1.00-2.00 sec 112 MBytes 942 Mbits/sec
...
[ 5] 9.00-10.00 sec 112 MBytes 942 Mbits/sec
This is odd, because the cloud machine has much lower specs. It has 8 virtual cores, but its ethtool -l
output already defaults to Combined: 1
because being a VM, it does not support RSS at all:
root@linux-germany-hcloud ~ # ethtool -x enp1s0
RX flow hash indirection table for enp1s0 with 1 RX ring(s):
Operation not supported
RSS hash key:
Operation not supported
RSS hash function:
toeplitz: on
xor: off
crc32: off
So somehow the weaker machine lacks the problem. Maybe there's some clever NIC hardware thing going on in the dedicated machine that creates the problem? What could it be?
I already tried disabling TCP Segment Offloading (ethtool -K eth0 tso off
) but that does not affect the results. The feature that caused the problem for in the Dropbox article (flow-director-atr
) is not available on my NIC, so that can't be it.
Question
What can explain the further 3x bottleneck in upload between comparing the two Linux servers?
How can I just get fast uploads from Windows?
More environment info
- Both Linux machines use the same Linux version
6.6.33 x86_64
and samesysctl
s (ensured via NixOS), which are:net.core.default_qdisc=fq net.core.rmem_max=1073741824 net.core.wmem_max=1073741824 net.ipv4.conf.all.forwarding=0 net.ipv4.conf.net0.proxy_arp=0 net.ipv4.ping_group_range=0 2147483647 net.ipv4.tcp_congestion_control=bbr net.ipv4.tcp_rmem=4096 87380 1073741824 net.ipv4.tcp_wmem=4096 87380 1073741824
- Windows Server 2019
Version 1809 (OS Build 17763.6293)
Edit 1
I found that I get 950 Mbit/s upload from Windows to other Hetzner-dedicated machines. The dedicated machines to which the upload is slow all have in common that they have Intel 10 Gbit/s network cards; from lspci
:
01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
lsmod | grep ixgbe
suggests that the ixgbe
driver is used here.
ixgbe
is also mentioned in the above Dropbox article. The paper "Why Does Flow Director Cause Packet Reordering?" they link mentions Intel 82599
specifically. I also found this e1000-devel thread where somebody mentions the problem in 2011, but no solution is presented.
When using the 1-Gbit Intel Corporation I210 Gigabit Network Connection (rev 03)
card present in the same model of server, the issue is gone and I get 950 Mbit/s.
So there seems to be something specific about 82599ES
/ixgbe
that causes the issue.
Edit 2: Intel Flow Director
and trying the out-of-tree ixgbe
Googling intel disable flowdirector
produces https://access.redhat.com/solutions/528603 mentioning Intel 82599.
Helps:
Intel Flow Director is an Intel NIC and driver feature which provides intelligent and programmable direction of similar network traffic (i.e. a "flow") into specific receive queues.
By default, Flow Director operates in ATR (Application Targeted Receive) mode. This performs regular RSS-style hashing when previously-unseen traffic is received. However, when traffic is transmitted, that traffic's tuple (or "flow") is entered into the receive hash table. Future traffic received on the same tuple will be received on the core which transmitted it. The sending and receiving process can then be pinned to the same core as the receive queue for best CPU cache affinity.
Note that community research has shown that ATR can cause TCP Out-of-Order traffic when processes are migrated between CPUs. It is better to explicitly pin processes to CPUs when using ATR mode.
FlowDirector is mentioned in the Dropbox article, and so is ATR
.
The mentioned "community research" is the same paper "Why Does Flow Director Cause Packet Reordering?" Dropbox refers to.
Doing the suggested
ethtool -K net0 ntuple on
improves the speed from 20 Mbit/s to 130 Mbit/s (with the default ethtool -L net0 combined 16
).
Running it for longer (iperf3 --time 30
) makes it drop to 80 Mbit/s after 16 seconds.
Using ntuple on
together with combined 16
does not improve it further.
So this is not a complete solution.
Testing the options ixgbe FdirMode=0
approach next.
On ram256g-1
:
rmmod ixgbe; modprobe ixgbe FdirMode=0; sleep 2; ifconfig net0 94.130.221.7/26 ; ip route add 192.0.2.2 dev net0 proto static scope link ; ip route add default via 192.0.2.2 dev net0 proto static ; echo done
dmesg
shows
ixgbe: unknown parameter 'FdirMode' ignored
That is despite https://www.kernel.org/doc/Documentation/networking/ixgbe.txt documenting it:
FdirMode
--------
Valid Range: 0-2 (0=off, 1=ATR, 2=Perfect filter mode)
Default Value: 1
Flow Director filtering modes.
So 0=off
seems even more desirable than the other two, which supposedly is what ntuple
on/off
switches between.
https://access.redhat.com/solutions/330503 says
Intel choose to expose some configurations as a module parameter in their SourceForge driver, however the upstream Linux kernel has a policy of not exposing a feature as a module option when it can be configured in ways already available, so you'll only see the some module parameters on Intel drivers outside the upstream Linux kernel tree.
Red Hat follow upstream kernel methods, so those options won't be in the the RHEL version of the driver, but the same thing can often be done with
ethtool
(and without a module reload).
This suggests that 0=off
is not actually achievable.
Or maybe it will work with modprobe.d
options but not the modprobe
command?
Relevant code:
- Old kernel with the
FdirMode
option: - New kernel without:
- https://github.com/torvalds/linux/blob/b86545e02e8c22fb89218f29d381fa8e8b91d815/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c#L648
- Suggests that Flow Director is only enabled if RSS queue length is
> 1
- So probably setting the queue length to 1 with
ethtool -L
(which is--set-channels
) should already achive it.
- Suggests that Flow Director is only enabled if RSS queue length is
- https://github.com/torvalds/linux/blob/b86545e02e8c22fb89218f29d381fa8e8b91d815/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c#L648
But it seems https://github.com/intel/ethernet-linux-ixgbe is still actively developed and supports all the old options.
Also supports FdirPballoc
which never existed in torvalds/linux
.
That is described in: https://forum.proxmox.com/threads/pve-kernel-4-10-17-1-wrong-ixgbe-driver.35868/#post-175787
Also related: https://www.phoronix.com/news/Intel-IGB-IXGBE-Firmware-Update
Maybe I should try to build and load that?
From that driver, FDirMode
was also removed:
- https://github.com/intel/ethernet-linux-ixgbe/commit/a9a37a529704c584838169b4cc1f877a38442d36
- https://github.com/intel/ethernet-linux-ixgbe/commit/a72af2b2247c8f6bb599d30e1763ff88a1a0a57a
From https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20160919/006629.html:
ethtool -K ethX ntuple on
This will enable "perfect filter" mode, but there are no filters yet, so the received packets will fall back to RSS.
Tried
ethtool -K net0 ntuple on
ethtool --config-ntuple net0 flow-type tcp4 src-ip 192.0.2.1 action 1
Did not improve speed.
I also found that the speed on Linux 6.3.1
seems to be 90 Mbit/s while it's 25 Mbit/s on 6.11.3
.
Compiling the out-of-tree ethernet-linux-ixgbe
on the Hetzner Rescue System Linux (old)
which has 6.3.1
(there is no release yet for Linux 6.11
):
wget https://github.com/intel/ethernet-linux-ixgbe/releases/download/v5.21.5/ixgbe-5.21.5.tar.gz
tar xaf *.tar.gz
cd ixgbe-*/src && make -j
# because disconnecting the ethernet below will hang all commands
# from the Rescue Mode's NFS mount if not already loaded into RAM
ethtool --help
timeout 1 iperf3 -s
dmesg | grep -i fdir
modinfo /root/ixgbe-*/src/ixgbe.ko # shows all desired options
rmmod ixgbe; insmod /root/ixgbe-*/src/ixgbe.ko; sleep 2; ifconfig eth1 94.130.221.7/26 ; ip route add 192.0.2.2 dev eth1 scope link ; ip route add default via 192.0.2.2 dev eth1 ; echo done
iperf3 -s
This driver provides a more solid 450 Mbit/s out of the box.
rmmod ixgbe; insmod /root/ixgbe-*/src/ixgbe.ko FdirPballoc=3; sleep 2; ifconfig eth1 94.130.221.7/26 ; ip route add 192.0.2.2 dev eth1 scope link ; ip route add default via 192.0.2.2 dev eth1 ; echo done
dmesg | grep -i fdir
iperf3 -s
Brings no improvement.
Also try:
AtrSampleRate
A value of 0 indicates that ATR should be disabled and no samples will be taken.
rmmod ixgbe; insmod /root/ixgbe-*/src/ixgbe.ko AtrSampleRate=0; sleep 2; ifconfig eth1 94.130.221.7/26 ; ip route add 192.0.2.2 dev eth1 scope link ; ip route add default via 192.0.2.2 dev eth1 ; echo done
dmesg | grep -i atrsample
iperf3 -s
Brings no improvement.
ethtool -L net0 combined 1
brings no improvement here either, and
ethtool -K eth1 ntuple on
ethtool -L eth1 combined 12 # needed, otherwise the out-of-tree ixgbe driver complains with `rmgr: Cannot insert RX class rule: Invalid argument` when `combined 1` is set
ethtool --config-ntuple eth1 flow-type tcp4 src-ip 192.0.2.1 action 1
brings no improvement either.
Edit 3: Changed NIC
I changed the NIC of the Linux server from Intel 82599ES to Intel X710, which uses the Linux i40e
driver.
The problem persisted.
I suspect it is because the X710, too, supports Intel Flow Director.
The partial mitigation of ethtool -L eth0 combined 1
has the same effect as for the 82599ES.
The command
ethtool --set-priv-flags eth0 flow-director-atr off
(which is possible for i40e
but not ixgbe
) mentioned by Dropbox as the workaround only achieved the same speedup as ethtool -L eth0 combined 1
(so to around 400 Mbit/s).
Interestingly, Hetzner reported that the Hetzner Cloud machines are also powered by the Intel X710, but they don't exhibit the problem.
I seems I found a solution to get full speed (caveats below):
This
SACK
support of the Linux kernelI had tried the first line before (it brought a moderate), but disabling SACKs with the second line fully fixes the problem for me, producing full gigabit upload speed from the Windows machine.
Each line alone does not produce an improvement, but combined they fix it.
This fixes the problem for both our Intel X710 and the Intel 82599ES servers.
Caveat:
I believe a side effect of disabling SACKs is that if there is packet loss, the connection speed temporarily drops down more than with SACKs enabled. However, it does recover within a few seconds, and on average I observed successful full gigabit speed uploads from Windows Server 2019 with these settings. I will test it more in the Windows-10-over-the-Internet situation soon, as I have only tested Windows-Server-2019-HEL1 -> Linux-FSN1 so far.
Edit: I have now confirmed that
sysctl -w net.ipv4.tcp_sack=0
is detrimental on long-range WAN connections that have packet loss. As soon as some packet loss happens, it maks the connection speed drop hard.So doing the full-speed approach with both settings is only recommended on connections where you know packet loss is low. If you're receiving from the general Internet, it is better to only use
and keep
sysctl -w net.ipv4.tcp_sack=1
. That will be more reliably over the Internet.It makes some sense that disabling SACKs has an effect given that the Dropbox article (and quoted Microsoft statement) mentions SACKs as well.
I do not understand yet what causes my
observation. Setting
net.ipv4.tcp_sack
on their VM hosts should have no effect because the Linux guest should be in control of TCP, not the host machine; the guest does have thetcp_sack=1
default and yet it works. Maybe there are more NIC settings that can achieve the same effect as disabling SACKs in the kernel that I haven't discovered yet, or maybe devices along the network path of dedicated servers cannot handle SACKs well, and those devices don't exist along the network path of Hetzner Cloud machines.This article mentions network gear making problems with SACK, but it mentions mostly complete stalls of the connection as oppose to slowdowns:
When to turn TCP SACK off?
I also observed that Windows 11 (I tested 23H2) has this problem fixed. According to the Microsoft article in the question, Windows Server 2022 also has it fixed.