I have been running MTR to/from one of my servers and noticed something that looks odd to me. Because I am not really into this I will give you three outputs:
This is from the server to my home location:
My traceroute [v0.75]
prag341.server4you.de (0.0.0.0) Sat Apr 16 12:31:36 2011
Keys: Help Display mode Restart statistics Order of fields quit
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. v9-609a.s4y14.fra.routeserver.net 0.0% 143 6.6 2.9 0.7 15.6 2.4
2. 217.118.16.161 0.0% 143 0.7 5.7 0.4 67.3 13.2
3. 217.118.16.25 0.0% 143 3.3 5.3 3.3 63.5 8.6
4. 194.25.211.53 0.0% 143 3.4 5.5 3.2 61.1 9.1
5. vie-sb2-i.VIE.AT.NET.DTAG.DE 0.7% 143 17.8 21.7 17.6 131.1 14.8
vie-sb2-i.VIE.AT.NET.DTAG.DE
6. at-vie05b-ri1-pos-5-0.aorta.net 0.7% 143 18.7 18.4 17.6 23.8 0.9
7. at-vie05b-ri2-ge-2-1-9.aorta.net 0.0% 143 17.9 18.6 17.5 41.7 2.6
8. at-vie01a-rd1-xe-1-0-0.aorta.net 0.0% 143 18.2 21.1 17.3 104.1 12.0
9. at-vie-sk11-pe01-vl-20.upc.at 0.0% 143 18.2 20.6 17.7 55.7 7.0
10. at-vie-sk11-pe02-vl-1.upc.at 0.0% 143 17.8 19.6 17.3 55.2 6.6
11. ???
This is from my home location to the server:
My traceroute
[v0.80]
joe-desktop (0.0.0.0) Sat Apr 16 14:27:54 2011
Keys: Help Display mode Restart statistics Order of fields quit
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. 192.168.1.1 0.0% 87 0.2 0.2 0.2 0.2 0.0
2. ???
3. 84.116.4.33 0.0% 86 9.7 9.0 6.3 27.3 3.5
4. at-vie-sk11-cia01-vl-2070.upc.at 0.0% 86 22.7 22.8 20.0 52.2 4.7
5. at-vie-sk11-pe01-vl-2069.upc.at 0.0% 86 47.6 23.9 20.2 47.6 5.8
6. at-vie01a-rd1-vl-2042.aorta.net 0.0% 86 21.7 25.0 20.1 61.7 8.5
7. de-fra03a-rd1-xe-9-2-0.aorta.net 0.0% 86 21.3 22.8 19.6 44.0 5.0
8. 84.116.132.154 0.0% 86 20.2 22.8 19.3 41.0 4.1
9. tge-5-1-0-353a.cr2.fra.routeserver.net 0.0% 86 38.6 27.4 20.9 120.2 16.0
10. 217.118.16.130 0.0% 86 23.7 26.9 20.8 73.0 9.8
11. 217.118.16.26 0.0% 86 25.5 28.8 22.9 85.1 11.8
12. 217.118.16.165 81.2% 86 68.2 37.5 25.0 68.2 10.3
13. prag341.server4you.de 0.0% 86 35.7 27.1 24.0 49.3 4.3
And this is from another server (amazon ec2) to the server:
My traceroute [v0.75]
flimmit.com (0.0.0.0) Sat Apr 16 12:32:50 2011
Keys: Help Display mode Restart statistics Order of fields quit
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. ip-10-48-192-3.eu-west-1.compute.internal 0.0% 178 0.4 0.9 0.3 16.4 1.7
ip-10-48-192-2.eu-west-1.compute.internal
2. ec2-79-125-0-244.eu-west-1.compute.amazonaws.com 0.0% 178 0.5 0.9 0.3 30.8 2.6
ec2-79-125-0-242.eu-west-1.compute.amazonaws.com
3. ???
4. ???
5. ???
6. xe-4-1-0.dub10.ip4.tinet.net 36.5% 178 1.9 3.9 1.6 56.8 8.5
7. xe-4-1-0.dub10.ip4.tinet.net 0.0% 178 12.1 9.7 1.6 92.5 10.5
xe-0-1-0.lon14.ip4.tinet.net
xe-2-1-0.lon14.ip4.tinet.net
8. xe-0-1-0.lon14.ip4.tinet.net 0.0% 177 17.4 17.7 11.1 184.3 24.6
xe-2-1-0.lon14.ip4.tinet.net
213.200.77.234
9. 213.200.77.234 0.0% 177 25.2 23.7 12.0 162.5 16.0
tge-4-2-0-0a.cr2.fra.routeserver.net
10. tge-4-2-0-0a.cr2.fra.routeserver.net 0.6% 177 178.6 57.1 24.7 178.6 39.0
217.118.16.26
11. 217.118.16.26 47.2% 177 32.7 61.1 29.1 164.4 35.4
217.118.16.165
12. 217.118.16.165 28.2% 177 28.9 29.8 27.8 48.9 4.2
prag341.server4you.de
13. prag341.server4you.de 1.1% 177 28.2 28.7 27.7 63.4 2.9
What looks weird to me is this very high loss >80% on the last hop from My home location to the server. The server is responding fine and services run smoothly.
It may be due to my lacking knowledge of networking but it would sound logical to me that loss rates should add up? But I often see MTR outputs where there are high loss rates on the way, but the final target loss is much lower.
So my questions are:
In my particular case, is this an indicator of a possible problem I should pay attention to?
In general, how do I interpret an output of mtr correctly? Can you recommend a good article / literature on that?
The packet loss is not necessary an indication of a problem. Remember those are attempts to communicate with that particular network node directly. Usually those in-between router nodes are only responsible for passing traffik through to another location. They are not required to chat with you directly at all, and one that drops most of your chat should not be a cause for concern. The only important number for you would be how many packets are getting through to your destination.
The most useful information to come out of those reports is the relative data of how far apart nodes are (in terms of packet time), and, even more importantly, how many hops there are so you can get an idea how long different legs of the journey will take for people trying to communicate with your servers. Usually the fewer hops there are, the more efficient the route -- indicating the quality of your ISP.
MTR is good at measuring latency and hops from site to site. It is commmon to have 100% loss starting at the firewall in front of a reachable site. I usually set the interval to 15 seconds or more to lighten the network load. It takes a little longer to get results, but I find the results are more reliable.
Some routers give lower priority to generating the error packet that MTR uses to detect intermediate routers. If the router is busy, they may drop the packet, and wait for the next one. This will cause high drop rates for that router. If there are routers further away with 0% loss things are fine.
Packet losses may also occur when the routes are dynamically changing. Your last trace shows the routes changing over time. Theses should be relatively short term and easily recovered.
Loss rates can be indicative of an overloaded router. If there are connection or packet loss problems, I start investigating at the closest router with a loss rate above 0%.