Ping a Specific Port

Question

Mr Shoubs

Asked: 2011-03-03 15:50:14 +0800 CST2011-03-03 15:50:14 +0800 CST 2011-03-03 15:50:14 +0800 CST

What causes the issue (possibly packet loss) in this scenario

772

I'm trying to diagnose a network related problem - please understand these points before suggesting an answer (apologies if more information is required, I will add anything people ask).

We have a server only network (5 app server, 4 db servers, few other servers) that appears to be suffering packet loss between servers
I can see this happening on wireshare - there are a lot of TCP Retransmissions, TCP_Out-of-Order, TCP DupACK and I think some TCP_ZeroWindow packets too.
There appears to be a lot of Bad Checksums on the IP protocol
I think the network adapters have a very constant and high (90-100%) load due to the extra retries caused by this packet loss
As the external requests on this network increase (to the app servers) the network performance decreases
the app servers generate their own traffic when used by the external request
The external requests come through a core router and the network is on it's own segment
This high load "magically" dissapeared after 1-2 days, I say magically as we where only monitoring at the adapters at the time the load dropped, there is still packet loss showing in wireshark, albeit a lesser amount.
Nothing points to a compromised server.
Unfortunately we don't have physical access to any of the hardware
We can't disrupt the current service

Given the above, what is the best way to determine what is causing the packet loss (we expect it to be a managed switch).

Is there any software that can provide us with empirical evidence of what is causing the issues?

Thanks in advance

1 Answers

Voted

sysadmin1138 · Answer 1 · 2011-03-03T16:22:38+08:00

In my experience Wireshark can return unreliable results on interfaces that are using hardware TCP-Offload. Duplicate packets are one of the symptoms of that.

That said, if you're using a span/mirror port to grab your captures duplicate acks on the wire are a significant problem.

Duplicate ACKs, out-of-orders, and retransmits are signals that the TCP stack on something is not behaving right. Correlating which network nodes are prone to throwing the errors will help isolate which hosts need further investigating. Any differences in network captures between a span/mirror port capture and a wireshark session on that specific node should help highlight problems it may be happening. If you see some, investigate updating the network drivers as those are frequently the easiest fix for that kind of issue (Broadcom is sadly notorious for this). Second to that, updating the firmware for the NICs can help as well.

If everything there looks healthy, you could just be seeing the normal flailing about wildly that TCP does when there is just plain too much traffic to handle.

TCP Zero-Window is also a sign of an unhealthy TCP/IP stack, though in my experience that sometimes occurs when two different TCP/IP stacks aren't getting along together. Such as can happen with Windows 2008 and certain older TCP/IP stacks in the Linux space.

What causes the issue (possibly packet loss) in this scenario

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?