My server causes too much traffic, so I have installed ntop to monitor it.
On the Summary -> Traffic page in the Global TCP/UDP Protocol Distribution table I can see the traffic is periodically caused by HTTP.
On the All Protocols -> Traffic page in the first row there is the traffic (94,4%). But the first column (Host) shows my own server. Why is this?
When clicking there, I can see that the traffic in the Host Traffic Stats table. It is all in the Tot. Traffic Rcvd column. Therefore I think, one of my applications ist periodically downloading something big, or a lot.
But how to find out, what was downloaded? What are the downloaded URLs or at least the hosts that caused the most traffic?
Ntop is a network interface tool - it shows you the traffic going over various ports and protocols, but that's where it ends. What you need to look at now is to target the application that's processing that traffic, in this case Apache.
The easiest way to do this is to install a web usage tool, like webalizer (there are many others, awstats was the 'best' a while back, not sure what's king now). This will run through your logs and generate pages of statistics that you can use to see where the traffic was going, where it was coming from and who was doing it. For Example.
Fix the systematic Issue:
Having your application logs that make requests be unknown and all over the place is problem. This is going to bite you in the ass again and again, so I would set aside some time to address this problem. Find some way to index or aggregate them. This is larger problem project that you should raise.
The Problem at Hand:
For the problem at hand, I would recommend wireshark / tcpdump. Once you have a traffic capture, you can use all sorts of techniques to try to find it. In wireshark you could use "statistics / conversations", sort by bytes, and then drill down into the captures from there. Riverbed's non-free Cascade Pilot does have "Web Bandwidth by Object" view for captures that would be good at this -- you could request a trial.
If you are not familiar with wireshark, now is a good time learn. It is a tool most sysadmins use on a regular basis.
If you know the server taking the bandwidth, and it is a Linux server, you might try Nethogs (
nethogs
) to identify the process using the bandwidth.You should examine your webservers access log, where all serviced requests are listed. You could filter for your webservers IP address and localhost and check most requested files. There are several tools for this but it depends on whatever webserver software you are using.