I am interested in particular answers:
- Does the NIC with GRO edits/creates TCP ACK or any other packets (or is this feature transparent to receiver/sender TCP stacks)?
- There should be a timeout/event when NIC should pass the "glued segments" to the TCP stack? What are they?
- In packet forwarding setup - does the GRO feature also tries to read receiver ACKs (see below why I am asking this)?
- Any source that explains GRO and also other NIC offloading features (TSO, LSO ...) better than wikipedia and linux man pages would be really appreciated.
More details:
I am troubleshooting a performance problem with one IPSec implementation. The problem is that available bandwidth is not evenly distributed across all 4 VPN tunnels (distributed approximately as 200MBps/200MBps/1MBps/1MBps; Each VPN tunnel encapsulates single TCP connection). In PCAP once in a while I see that webserver idles for like ~2 seconds (waiting for ACK). Downloading resumes when webserver retransmits unacknowledged segments.
My inner felling from PCAP is that NIC GRO feature glues packets together but sometimes do not pass them to TCP stack in a timely manner and that is causing the problems.
As this VPN server does not have interfaces that terminate TCP connections but rather only forwards packets. Then I tried to disable GRO and after that I observed that traffic was evenly distributed across all tunnels. Also when TCP window scaling is disabled on Webserver, then bandwidth is also even distributed even with GRO enabled (that is why I had question #3).
I am using 2.6.32-27 linux on Ubuntu 10.04 server (64-bit). NIC is Intel 82571EB. All interfaces (HTTP client, VPN client, VPN Server, Webserver) are connected directly in chain with 1Gbit Ethernet cables.
I've found this article amazingly useful: JLS2009: Generic receive offload. It gives a great overview of how GRO works.
Ethtool may be able to enable/disable GRO on specific interfaces. Depends on the version.