I have a really strange one.
I have packet loss with Excessive 'TCP Dup ACK' & 'TCP Fast Retransmission' when I download files (and only download) from two different Windows 2008 servers. Upload speed is fine.
This ONLY occurs if the client computers(Win7) is connected at 100mb/s. At 1GB, no errors and I get full speed. If I set the client nic to 100Mb/s, I get a lot of 'TCP Dup' errors and the download speed drops to around 2-5MB/s. Upload speed is 10MB/s or above.
This only happens to the Windows 2008 Server boxes (Dell, but different hardware). This problem does not occur if I transmit between the Win7 clients and the Linux servers.
It's like Server 2008 is unable to scale the TCP window properly, overloads the switch or something, then pauses traffic for a bit.
Parts of the network run at 100Mb/s due to older equipment, so this is really causing a problem in some buildings.
I have uploaded a pcap file from the client here. https://dl.dropboxusercontent.com/u/24907255/slow.pcap.gz
It shows a 50MB file being written to the server, then read back from the server with the errors.
Thanks for any help. I am stumped.
11/28/13 More Information.
I shutdown the entire network so that only one client and one server are on the network. No change in the problem.
If I set every interface, server, client and Cisco 2960 switch to 100Mbs full, then the problem goes away. If I set the server and switch interface auto or 1Gbs, the problem is back.
If I bypass the switch with a Netgear 10/100 switch and set both client and server to auto, I have no problems.
I did discover this. In the normal setup, with server to switch at 1Gbs, I plug in the Netgear 10/100 switch between the client and Cisco switch, my speed problem is even worse. Speeds go from 5-7MB/s to 2-3MB/s, and yes I have tried fixed and auto network speeds. This would explain why some of the buildings that have a 2 switch hop between them and the main Cisco switch have more of a speed problem.
On to pinging. With everything at 1GB/s, I can ping a full TCP payload, ping -l 65500 and it works. With the client at 100Mbs, the max size I can ping is 17752. Anymore and it fails, to the Windows servers only, no problem on the Linux boxes. With the Netgear 10/100 between the server and client, no problems pinging at 65500.
Update 3
I swapped in a PowerConnect 2748 switch. Same problem with the server at 1Gbs and the client at 100Mbs. I can ping over 17752 now tho. Strange. So I don't think it's the Cisco switch.
Update 4. I am trying to get some hard numbers by using ipref. All systems connected to the same switch, with the client set to 100Mbs and running the command ipref.exe -c -u -b 10m. So sending to the server. One server is 2008 with no load on it right now, other is a Ubuntu with a load avg of .20.
At 10m
- Linux jitter 0.022ms, packet loss is 0/8505
- Server 2008 jitter 1.859, packet loss 68/8505
Pushing it to 100m
- Linux jitter 0.445, packet loss 0/26634
- Server 2008 jitter 0.542, packet loss 94/26596
Now for stats sending TO the client at 10m
- Linux jitter 0.271 ms, 0/ 8500 (0%) 1 datagrams received out-of-order
- Server 2008 jitter .063, 20/8505 (0.24%)
Pushing it to 100m
- Linux jitter 0.230 ms 4083/85443 (4.8%), 1 datagrams received out-of-order, 95.7Mbs
- Server 2008 jitter 0.237, 28174/81718 (47%), 51.1mbs
So Server 2008 is poor in general, but you can see the huge packet loss 47% when the connection is pushed to the clients 100mbs limit.
Update 5.
When I tested with the PowerConnect 2748 switch, I used different cat5 cable between the server and switch and client and switch. This should rule out cabling or switch issues.
I have two Windows 2008 Servers in this environment, installed at different times, and on different hardware. The only thing they share is a Broadcom branded nic, but the chipset is different. Both experience the same problem, but I am doing my main testing on one so in case something goes wrong, the other will still work.
The one server has a built on BCM5709C with two ports, and an add-on card, pci express I think, card also with the same BCM5709C chipset and two ports. I have tried all of them and the problem still exist. So this should rule out any hardware problems.
Update 6 12/3/13 I installed the Intel nic. No change. I played around with the ctcp settings and no change there. I even turned off SMB2 and no difference.
I did some more testing at 100Mbs Copying a 3GB ISO image TO the server, drag and drop, averages out at 10MB/s. Copying the same 3GB ISO image FROM the server, averages out at 6.3MB/s.
With all network interfaces set to Auto and at 1Gbs. Copying the ISO TO the server, averages 101MB/s Copying the ISO FROM the server, averages 57MB/s
So read speeds from the server are almost half the write speeds.