Ping a Specific Port

Question

Porch

Asked: 2013-11-21 19:42:08 +0800 CST2013-11-21 19:42:08 +0800 CST 2013-11-21 19:42:08 +0800 CST

Dropped packets, on recieve only, Server 2008 only, and network speed is 100mb/s

772

I have a really strange one.

I have packet loss with Excessive 'TCP Dup ACK' & 'TCP Fast Retransmission' when I download files (and only download) from two different Windows 2008 servers. Upload speed is fine.

This ONLY occurs if the client computers(Win7) is connected at 100mb/s. At 1GB, no errors and I get full speed. If I set the client nic to 100Mb/s, I get a lot of 'TCP Dup' errors and the download speed drops to around 2-5MB/s. Upload speed is 10MB/s or above.

This only happens to the Windows 2008 Server boxes (Dell, but different hardware). This problem does not occur if I transmit between the Win7 clients and the Linux servers.

It's like Server 2008 is unable to scale the TCP window properly, overloads the switch or something, then pauses traffic for a bit.

Parts of the network run at 100Mb/s due to older equipment, so this is really causing a problem in some buildings.

I have uploaded a pcap file from the client here. https://dl.dropboxusercontent.com/u/24907255/slow.pcap.gz

It shows a 50MB file being written to the server, then read back from the server with the errors.

Thanks for any help. I am stumped.

11/28/13 More Information.

I shutdown the entire network so that only one client and one server are on the network. No change in the problem.

If I set every interface, server, client and Cisco 2960 switch to 100Mbs full, then the problem goes away. If I set the server and switch interface auto or 1Gbs, the problem is back.

If I bypass the switch with a Netgear 10/100 switch and set both client and server to auto, I have no problems.

I did discover this. In the normal setup, with server to switch at 1Gbs, I plug in the Netgear 10/100 switch between the client and Cisco switch, my speed problem is even worse. Speeds go from 5-7MB/s to 2-3MB/s, and yes I have tried fixed and auto network speeds. This would explain why some of the buildings that have a 2 switch hop between them and the main Cisco switch have more of a speed problem.

On to pinging. With everything at 1GB/s, I can ping a full TCP payload, ping -l 65500 and it works. With the client at 100Mbs, the max size I can ping is 17752. Anymore and it fails, to the Windows servers only, no problem on the Linux boxes. With the Netgear 10/100 between the server and client, no problems pinging at 65500.

Update 3

I swapped in a PowerConnect 2748 switch. Same problem with the server at 1Gbs and the client at 100Mbs. I can ping over 17752 now tho. Strange. So I don't think it's the Cisco switch.

Update 4. I am trying to get some hard numbers by using ipref. All systems connected to the same switch, with the client set to 100Mbs and running the command ipref.exe -c -u -b 10m. So sending to the server. One server is 2008 with no load on it right now, other is a Ubuntu with a load avg of .20.

At 10m

Linux jitter 0.022ms, packet loss is 0/8505
Server 2008 jitter 1.859, packet loss 68/8505

Pushing it to 100m

Linux jitter 0.445, packet loss 0/26634
Server 2008 jitter 0.542, packet loss 94/26596

Now for stats sending TO the client at 10m

Linux jitter 0.271 ms, 0/ 8500 (0%) 1 datagrams received out-of-order
Server 2008 jitter .063, 20/8505 (0.24%)

Pushing it to 100m

Linux jitter 0.230 ms 4083/85443 (4.8%), 1 datagrams received out-of-order, 95.7Mbs
Server 2008 jitter 0.237, 28174/81718 (47%), 51.1mbs

So Server 2008 is poor in general, but you can see the huge packet loss 47% when the connection is pushed to the clients 100mbs limit.

Update 5.

When I tested with the PowerConnect 2748 switch, I used different cat5 cable between the server and switch and client and switch. This should rule out cabling or switch issues.

I have two Windows 2008 Servers in this environment, installed at different times, and on different hardware. The only thing they share is a Broadcom branded nic, but the chipset is different. Both experience the same problem, but I am doing my main testing on one so in case something goes wrong, the other will still work.

The one server has a built on BCM5709C with two ports, and an add-on card, pci express I think, card also with the same BCM5709C chipset and two ports. I have tried all of them and the problem still exist. So this should rule out any hardware problems.

Update 6 12/3/13 I installed the Intel nic. No change. I played around with the ctcp settings and no change there. I even turned off SMB2 and no difference.

I did some more testing at 100Mbs Copying a 3GB ISO image TO the server, drag and drop, averages out at 10MB/s. Copying the same 3GB ISO image FROM the server, averages out at 6.3MB/s.

With all network interfaces set to Auto and at 1Gbs. Copying the ISO TO the server, averages 101MB/s Copying the ISO FROM the server, averages 57MB/s

So read speeds from the server are almost half the write speeds.

6 Answers

Voted

Teun Vink · Answer 1 · 2013-11-24T07:38:43+08:00

Teun Vink

2013-11-24T07:38:43+08:002013-11-24T07:38:43+08:00

This sounds like a speed/duplex mismatch causing collisions and retransmits. Misconfiguration between the server and the other side could cause this. Another reason for the mismatch could be failing autonegotiation.

Make sure both ends of the connection are configured identically regarding speed and duplex.

6

ErikE · Answer 2 · 2013-11-25T22:40:48+08:00

I believe you should investigate if any of the NIC driver/Windows NDIS offload settings relate to your problem. I am most suspicious of the LSO (Large Send Offload) function as I've seen it totally wreck a service (Dell server w. Broadcom NIC) in a manner which defied all troubleshooting book definitions of anything.

The actual effect of LSO when it disrupts rather than enhances, is that the LSO engine may pass larger data frames that the switch supports. This causes the switch to silently discard those frames. Needless to say this causes performance degradation and packet loss. The failure can be imminent, but can also be intermittent making it tremendously difficult to troubleshoot. This is described in detail here: Large Send Offload and Network Performance

Disclaimer: this is just best effort thoughts on a possible angle on your problem. Implementing any one of the changes below will disrupt your network communication. The computer should be restarted after applying any of the settings. I copy/paste the most interesting settings for reference, but the links contain all the hardcore info and caveats. I most strongly recommend using the official docs as the basis for change and this post at most like a checklist.

Before proceeding with any of this, back up your registry key of:

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

One uncool reason is due to an official bug described below, which changes some unrelated values when certain settings are sent through the command line.

I freely admit that where settings are present in both the Windows NIC driver GUI and in Windows, I never really got clarity in if one has to disable both in the GUI and through Windows CMD/Registry, or if one suffices. The blogs I've read which presented an answer have been inconsistent with regards to some minor detail or other so I never was sure. Nowdays I attempt change everywhere I find the option for whichever setting I'm focusing on. The GUI options are not presented here, but are described in the official docs.

Also, different NIC drivers for the same card may present varying granularity in the advanced settings in the GUI.

Disabling Task Offloading

This registry setting disables task offloading as defined in Using Registry Values to Enable and Disable Connection Offloading.

HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\DisableTaskOffload
Setting this value to one disables all of the task offloads from the TCP/IP
transport. Setting this value to zero enables all of the task offloads.

If the above setting has any effect you could try going granular as specified in the link. There are quite a number of settings governing this so I won't paste them all in.

I'll supply the LSO ones though:

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\LsoV1IPv4
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\LsoV2IPv4
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\LsoV2IPv6

For all three: Enabled = 1(default). Disabled = 0.

Disabling connection offloading

As defined in Using Registry Values to Enable and Disable Connection Offloading.

HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\TCPConnectionOffloadIPv4
Describes whether the device enabled or disabled the offload of TCP connections
over IPv4. Enabled = 1 (Default). Disabled = 0.

HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\TCPConnectionOffloadIPv6
Describes whether the device enabled or disabled the offload of TCP connections
over IPv6. Enabled = 1 (Default). Disabled = 0.

Disabling TCP Chimney, TOE and TSO

As specified in How to Disable TCP Chimney, TCPIP Offload Engine (TOE) or TCP Segmentation Offload (TSO) Note the Win2008 hotfix

and in Information about the TCP Chimney Offload, Receive Side Scaling, and Network Direct Memory Access features in Windows Server 2008.

Windows 2008 Server:
If the operating system is Microsoft Windows Server 2008 (any version
including R2), run the following from a Command prompt:

1. netsh int tcp set global chimney=disabled
2. netsh int tcp set global rss=disabled
3. netsh int tcp set global netdma=disabled

Note: To display current global TCP settings, use the net shell command:
netsh int tcp show global

4. Restart the server.

Note: Microsoft has identified an issue running the netsh command to set global
TCP parameters on Windows Server 2008 and Vista machines.  Some global
parameters, such as TCPTimedWaitDelay, can be changed from their default or
manually set values to 0xffffffff.  Before running the above command, Symantec
recommends reviewing Microsoft KB Article 967224 (support.microsoft.com/kb/967224).
Upon completion of the above command's execution, Symantec also recommends
reviewing the TCP Parameters noted in the KB Article and applying the hotfix from
the article if needed.

` The hotfix describes the issue thus:

After you run the command, the values of the following unrelated settings are
changed to 0xFFFFFFFF:
KeepAliveInterval
KeepAliveTime
TcpTimedWaitDelay

In addition, the "TcpMaxDataRetransmissions" are changed to 0xFF.

Again, one may therefore wish to backup the entire registry key before doing anything:

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

If you google you problem together with offloading highlights from above, you'll find no end to posts, articles and blogs describing similar issues due to NIC offloading. But if it still doesn't work then I guess you can move on up the stack to try other things out, because it isn't due to half broken cable, NIC or switchport, right?

nandoP · Answer 3 · 2013-11-24T13:31:32+08:00

nandoP

2013-11-24T13:31:32+08:002013-11-24T13:31:32+08:00

always look at the networking device for clues..... so, if cisco, do a "show interfaces f0/11" or whatever it may be in your case. retransmits can also be due to a bad ethernet port/nic/cable, such as due to "crosstalk"..... show int on the switch should show you these error stats, if thats the case, and it will be obviously way too high

EDIT: as this is microsoft, its most likely thats your problem, but other than that, in general, start at layer one (make sure phyical cables are good), and work your way up the stack, ... ie layer 2, speed/duplex/mac address fltering,.. then layer 3 ip/udp/tcp firewalling,...etc

1

ibre5041 · Answer 4 · 2013-11-27T06:34:20+08:00

ibre5041

2013-11-27T06:34:20+08:002013-11-27T06:34:20+08:00

This can also be "advanced" NIC atributes, like PowerManagement ones or IRQ priority. Assuming you have the same version of drivers. Go to:

Device Manager -> Network Interfaces -> Properties for the NIC -> Advanced Tab.

Check and compare all values here.

1

Veniamin · Answer 5 · 2013-11-28T12:04:32+08:00

Did you checked for jumbo frames are off on your 100/1000 network?

UPD:

If jumbo frames are used then all netowrking hardware on broadcast domain should use It. That is impossible with legacy 100mb devices.

I do not know how win2008 tcp works exactly but providing jombo frames it may start scaling transmission window with packet size (not packet count as usual). Then you will observe the situation like described.

FYI: http://m.windowsitpro.com/windows/q-how-do-i-enable-jumbo-frames

UPD2:

I looked to the packet dump you have supplied and saw a lot of packet with length > 1500 and bad checksums (checksums for lengths < 1500 are OK). It confirms my assumption.

The only thing I can not understand - they are relevant to the first session: from client to server (!!!???):

22:25:06.041113 IP (tos 0x0, ttl 128, id 31391, offset 0, flags [DF], proto TCP (6), length 40)  192.168.0.109.49225 > 192.168.0.252.microsoft-ds: Flags [.], cksum 0x9422 (correct), ack 1453, win 1234, length 0

22:25:06.041223 IP (tos 0x0, ttl 128, id 31392, offset 0, flags [DF], proto TCP (6), length 64280, bad cksum 0 (->285)!) 192.168.0.109.49225 > 192.168.0.252.microsoft-ds: Flags [.], cksum 0x82c0 (incorrect -> 0xc9bb), seq 718652:782892, ack 1453, win 1234, length 64240SMB-over-TCP packet:(raw data or continuation?

22:25:06.041254 IP (tos 0x0, ttl 128, id 31437, offset 0, flags [DF], proto TCP (6), length 1452) 192.168.0.109.49225 > 192.168.0.252.microsoft-ds: Flags [P.], cksum 0x0517 (correct), seq 782892:784304, ack 1453, win 1234, length 1412SMB-over-TCP packet:(raw data or continuation?)

22:25:06.041278 IP (tos 0x0, ttl 128, id 31438, offset 0, flags [DF], proto TCP (6), length 2960, bad cksum 0 (->f1df)!) 192.168.0.109.49225 > 192.168.0.252.microsoft-ds: Flags [.], cksum 0x82c0 (incorrect -> 0xfa12), seq 784304:787224, ack 1453, win 1234, length 2920SMB-over-TCP packet:(raw data or continuation?)

22:25:06.042134 IP (tos 0x0, ttl 128, id 31441, offset 0, flags [DF], proto TCP (6), length 2960, bad cksum 0 (->f1dc)!) 192.168.0.109.49225 > 192.168.0.252.microsoft-ds: Flags [.], cksum 0x82c0 (incorrect -> 0x1d7e), seq 787224:790144, ack 1453, win 1234, length 2920SMB-over-TCP packet:(raw data or continuation?)

22:25:06.042492 IP (tos 0x0, ttl 128, id 31444, offset 0, flags [DF], proto TCP (6), length 5880, bad cksum 0 (->e671)!) 192.168.0.109.49225 > 192.168.0.252.microsoft-ds: Flags [.], cksum 0x82c0 (incorrect -> 0xa74e), seq 790144:795984, ack 1453, win 1234, length 5840SMB-over-TCP packet:(raw data or continuation?)

ErikE · Answer 6 · 2013-11-30T11:02:52+08:00

The effects you describe in your later findings are in line with the way IEEE 802.3u operates:

If you hard set the speed of one of the interfaces (NIC/Switchport) and set the other to Auto, you will likely suffer a duplex mismatch.
If you hard set one of the interfaces to full duplex, the other cannot autonegotiate duplex but must also have it hard set.
Even if both interfaces are hard set to Auto/Full duplex, some NICs(or poorly written Windows drivers) still leave the auto negotiation in operative mode and default to half duplex.

This is where I got those facts:

Two documents from Cisco relate (amongst others) to the 2900 series switches and troubleshooting NIC to switchport connectivity issues. They include concrete troubleshooting steps, especially for the switch side but also for the NICs. As Cisco has a lead on practical network analysis including in-depth knowledge of fundamental preconditions (such as the auto-negotiation electrical protocol), it is quite likely that the PowerConnect has similar working conditions (developed against the same protocol standards). I will quote freely for completeness and shape it up a bit later, but I would urge you to skim them through:

Troubleshooting Cisco Catalyst Switches to NIC Compatibility Issues

Configuring and Troubleshooting Ethernet 10/100/1000Mb Half/Full Duplex Auto-Negotiation

Here I quote some of the really interesting stuff:

Autonegotiation Valid Configuration Table

Speed determination issues can result in no connectivity. However, issues 
with autonegotiation of duplex generally do not result in link establishment
issues. Instead, autonegotiation issues mainly result in performance-related
issues. The most common problems with NIC issues deal with speed and duplex
configuration.  

Table 1 summarizes all possible settings of speed and duplex for FastEthernet 
NICs and switch ports.

Then follows an extremely useful table which I'll try to port here later without loosing formatting. The table also includes 1Gbps speed combinations with similar interesting effects and comments. However, highlights include:

* Configuration NIC (Speed/Duplex): 100Mbps, full duplex
* Configuration Switch (Speed/Duplex): auto
* Resulting NIC Speed/Duplex: 100Mbps
* Resulting Catalyst Speed/Duplex: 100Mbps half duplex
Comments: duplex mismatch (footnote 1)

* Configuration NIC (Speed/Duplex): auto
* Configuration Switch (Speed/Duplex): 100Mbps, full duplex
* Resulting NIC Speed/Duplex: 100Mbps full duplex
* Resulting Catalyst Speed/Duplex: 100Mbps half duplex
Comments: duplex mismatch (footnote 1)

* Configuration NIC (Speed/Duplex): 100Mbps, full duplex
* Configuration Switch (Speed/Duplex): 100Mbps, full duplex
* Resulting NIC Speed/Duplex: 100Mbps, full duplex
* Resulting Catalyst Speed/Duplex: 100Mbps, full duplex
Comments: Correct manual config (footnote 2)

The table footnotes are most interesting:

(1) A duplex mismatch can result in performance issues, intermittent
connectivity, and loss of communication. When you troubleshoot NIC issues,
verify that the NIC and switch use a valid configuration.

(2) Some third-party NIC cards can fall back to half-duplex operation mode,
even though both the switchport and NIC configuration are manually configured
for 100 Mbps, full-duplex. This is because NIC autonegotiation link detection
still operates when the NIC is manually configured. This causes duplex
inconsistency between the switchport and the NIC. Symptoms include poor port  
performance and frame check sequence (FCS) errors that increment on the
switchport. In order to troubleshoot this issue, try to manually configure
the switchport to 100 Mbps, half-duplex. If this action resolves the
connectivity problems, this NIC issue is the possible cause. Try to update
to the latest drivers for your NIC, or contact your NIC card vendor for
additional support.

Why Is It That the Speed and Duplex Cannot Be Hardcoded on Only One Link Partner?

As indicated in Table 1, a manual setup of the speed and duplex for
full-duplex on one link partner results in a duplex mismatch. This happens
when you disable autonegotiation on one link partner while the other link
partner defaults to a half-duplex configuration. A duplex mismatch results
in slow performance, intermittent connectivity, data link errors, and other
issues. If the intent is not to use autonegotiation, both link partners must
be manually configured for speed and duplex for full-duplex settings.

The very last topic of the NIC Compatibility link carries a technical background to the effects described in the passages quoted above. The basis for this background are some key details of the operation of the auto negotiation protocol:

(Table of bits shortened down for relevance)
0.13     Rate Selection (least-significant bit [LSB])
             0.6 0.13 1 1 reserved
             1 0 1000 Mbps : 0 1 100 Mbps : 0 0 10 Mbps

0.12     Autonegotiation Enable 
             1 = autonegotiaton enabled
             0 = autonegotiation disabled

0.8  Duplex Mode     1 = full-duplex 0 = half-duplex

0.6  Rate Selection (most-significant bit [MSB]). See bit 0.13

The register bits relevant to this document include 0.13, 0.12, 0.8, and 0.6.
The other register bits are documented in the IEEE 802.3u specification.
Based on IEEE 802.3u, in order to manually set the rate (speed), the
autonegotiation bit, 0.12, must be set to a value of 0. As a result,
autonegotiation must be disabled in order to manually set the speed and
duplex.
If the autonegotiation bit 0.12 is set to a a value of 1, bits 0.13 and 0.8
have no significance, and the link uses autonegotiation to determine the
speed and duplex. When autonegotiation is disabled, the default value for
duplex is half-duplex, unless the 0.8 is programmed to 1, which represents
full-duplex.

Based on IEEE 802.3u, it is not possible to manually configure one link
partner for 100 Mbps, full-duplex and still autonegotiate to full-duplex
with the other link partner. If you attempt to configure one link partner
for 100 Mbps, full-duplex and the other link partner for autonegotiation,
it results in a duplex mismatch. This is because one link partner
autonegotiates and does not see any autonegotiation parameters from the
other link partner and defaults to half-duplex.

In addition I found bug reports to similar effect from Cisco, but they are very specific with regards to combinations of switch hardware/software, os version, nics and drivers. Without knowing exact details it gets too speculative.

I believe this may just be a confirmation of your findings, by way of protocol definition and operandum.

Solutions

So assuming this was not a wild (but fun) goose chase, I quote you:

1) "If I set every interface, server, client and Cisco 2960 switch to 100Mbs full, then the problem goes away. If I set the server and switch interface auto or 1Gbs, the problem is back."

2) "If I bypass the switch with a Netgear 10/100 switch and set both client and server to auto, I have no problems."

3) Try to find NIC/driver combinations compatible with the old switches. Purchase as neccessary.

4) Use solid technical references and reasoning to motivate budget for upgrading switches where neccessary.

Dropped packets, on recieve only, Server 2008 only, and network speed is 100mb/s

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?