Recently upgraded to Debian 11 bullseye
, and our Docker swarm node
s had trouble talking to each other, and started dropping some connections.
After much Google-ing, we stumbled on these threads:
And specifically this command, which fixed the issue:
ethtool -K <interface> tx-checksum-ip-generic off
While I'm thrilled that this fixes it... I'm a little concerned because I'm having trouble figuring out exactly what that setting actually does.
I get that it disables some sort of checksuming (looks like maybe a UDP or TCP checksum of packets coming in through the network) and maybe disables the offloading of that checksum to the hardware (doing the checksum in software too? or on the CPU?), but I'm having trouble finding out any specifics beyond that/if that explanation is even correct.
Similarly, should I be worried about turning this off? Will it impact performance? Will it cause other networking issues?
If anyone can provide any details on what exactly this does and if it has any other impacts I should watch for/metric/measure, I'd be very appreciative!
Thanks!
The tx-checksum-ip-generic offloading feature in ethtool is related to the checksum calculation for outgoing IP packets. When enabled, it offloads the calculation of the IP checksum to the network interface card's hardware, which can improve performance by reducing CPU overhead. However, there have been cases where enabling this offloading feature on certain network cards or in specific network environments can cause issues, including dropped connections and networking problems.
The calculation of the IP checksum is a process used to verify the integrity of IP packets during transmission over a network. It involves performing a mathematical calculation on the packet's header and payload data to produce a checksum value. This checksum value is included in the packet's header.
When a network device receives an IP packet, it recalculates the checksum using the same algorithm and compares it with the checksum value in the packet's header. If the calculated checksum matches the one in the header, it indicates that the packet was not corrupted during transmission.
By disabling the tx-checksum-ip-generic offloading using the ethtool command, you are instructing the network interface to calculate the IP checksum in software rather than offloading it to the hardware. Disabling this offloading has resolved the connectivity issues in your Docker swarm nodes.
Regarding the impact of disabling this feature, it can vary depending on your specific network environment and hardware. In most cases, disabling the offloading does not significantly impact performance. However, it's recommended to monitor the network performance and observe for any adverse effects after disabling the offloading. If you don't notice any negative impact on network performance or experience any other networking issues, it should be safe to keep it disabled.
It's important to note that the impact and behavior can vary depending on the network card, driver, and network environment. Therefore, it's always recommended to test and evaluate the effects in your specific setup to ensure stability and optimal performance.
"The tx-checksum-ip-generic offloading feature in ethtool is related to the checksum calculation for outgoing IP packets. When enabled, it offloads the calculation of the IP checksum to the network interface card's hardware, which can improve performance by reducing CPU overhead." Just checked this statement on my linux machine, the offloaded checksum calculation is actually the tranfer layer checksum calculation instead of "ip checksum". you can confirm this by a tcpdump capture.
Turn this switch off does have some penalty on the network performance. Because some work that used to be handled by network card is now handled by kernel using precious cpu. But the as far as I know, the performance penalty is not severe. Perheps from 20% to 40% depends on your system.