How can I passively monitor the packet loss on TCP connections to/from my machine?
Basically, I'd like a tool that sits in the background and watches TCP ack/nak/re-transmits to generate a report on which peer IP addresses "seem" to be experiencing heavy loss.
Most questions like this that I find of SF suggest using tools like iperf. But, I need to monitor connections to/from a real application on my machine.
Is this data just sitting there in the Linux TCP stack?
For a general sense of the scale of your problem
netstat -s
will track your total number of retransmissions.You can aso grep for
segments
to get a more detailed view:For a deeper dive, you'll probably want to fire up Wireshark.
In Wireshark set your filter to
tcp.analysis.retransmission
to see retransmissions by flow.That's the best option I can come up with.
Other dead ends explored:
netstat -s
showed that it is just printing/proc/net/netstat
These stats are in /proc/net/netstat and
collectl
will monitor them for you either interactively or written to disk for later playback:Of course, if you'd like to see then side-by-side with network traffic, just include
n
with-s
:You can use the
ss
tool to get detailed TCP statistics:Under Debian, use
apt-get install iproute
to get the binary.It looks like some guys at the University of North Carolina (UNC) built a utility to investigate exactly this:
http://www.cs.unc.edu/~jasleen/Research-passivetcp.htm#Tool
I won't say it is production quality. Previously I've built quick perl scripts to store ip/port/ack tuples in memory and then report on duplicated data from scanning pcap output, this looks like it provides more thorough analysis.
You may want to look at the
dropwatch
utility.Apparently good old sar can gather retransmission (and other tcp statistics), along with all kinds of other system statistics that might also be interesting if you investigate a problem like cpu, memory, disk I/O, etc.
You may need to install a package: sysstat and enable this particular kind of statistics with the switch -S SNMP, on RHEL/OracleLinux this is configure in /etc/cron.d/sysstat where /usr/lib64/sa/sa1 is invoked every 5 minutes by default, but that can be tuned also.
For analysis of this data use:
Looks like
/proc/net/snmp
is where the values fornetstat -s
are sourced. So here is quick gawk script to find the % of segments that are retransmitted:An internal (no public IP or public traffic) AWS instance which we suspected was having networking issues with other systems in the VPC showed 0.0229% retransmitted, which was over 10 times higher than the 0.002% max we saw on other nodes. One really bad instance got as high as 2.32% of all outbound packets were retransmited segments.
You can also see the rate of retransmits during a given time window using:
In recent Linux versions, netstat has been replace with
ss
andip
. Another answer explains how to usess
. Withip
, you can get the number of dropped packets with this command:See also: