I have Apache set up to serve several Virtual Hosts, and I would like to see how much bandwidth each site uses. I can see how much the entire server uses, but I would like more detailed reports.
Most of the things I have found out there are for limiting bandwidth to virtual hosts, but I don't want to do that; I just want to see which sites are using how much bandwidth.
This isn't for billing purposes, just for information.
Is there an apache module I should use? Or is there some other way to do this?
The information you're after is all in the logs, so you should look at a log analyzer such as AWStats. The other option is to use Google Analytics.
For analyzing the logs, here's a rough example which you can use to tell you how many MB of traffic a log file reports from the command line:
I suggest you use the wonderful apache logging mechanism and its less known %I and %O flags:
Define the format:
Use it in your main httpd.conf:
The values are probably not accounting all headers information, but are quite accurate to have a precise idea of VirtualHost traffic.
Scan the logs with a perl script to aggregate per virtual host every n minutes (5 for example) and send this to cacti.
These flags are provided by mod_logio which is probably built in your Apache (as it for my Debian's Apache).
Awstats is one way to do this but probably not the best
If you decide to use awstats with Apache, out of the box it will show you aggregated bandwidth for your entire server.
To see bandwidth on a per virtual host basis, I recommend installing vlogger.
Vlogger will actually gather Apache access log information for each of your virtual hosts that you set up to do so in separate directories/files.
For example if your Apache log file is in /var/log/apache2, typical vlogger installation will create something like this for your virtual hosts (e.g. vhost1.com vhost2.com):
Vlogger gives you the option to rotate these logs for you, provides a way to change the naming template of the access log file (e.g. add a date), and claims it handles a large number of log files better than Apache.
One down side to this is that you won't have an aggregated server view anymore (you'll need to aggregate logs separately or perhaps use an additional apache setting or perhaps some other method?).
I would caution against using google analytics (or any javascript based tracking) for server bandwidth monitoring as you are relying on the client to report via the javascript. GA does not report to you people who have their javascript disabled as well as any crawlers/spiders/bots.
Here is some regex to parse the log format proposed by Xerxes.
\[([0-9]+)/(\w+)/([0-9]{4})[^\]]+\]\s(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s([^/]+)[^\s]+\s(\d+)\s(\d+)
Example log:
[12/Jan/2011:14:25:04 +0000] 157.157.12.206 files.hjaltijakobsson.com / 581 669 [12/Jan/2011:14:25:04 +0000] 157.157.12.206 files.hjaltijakobsson.com / 624 747 [12/Jan/2011:14:25:04 +0000] 157.157.12.206 files.hjaltijakobsson.com /icons/blank.gif 687 186 [12/Jan/2011:14:25:04 +0000] 157.157.12.206 files.hjaltijakobsson.com /icons/compressed.gif 693 188 [12/Jan/2011:14:25:04 +0000] 157.157.12.206 files.hjaltijakobsson.com /favicon.ico 592 512
Matches:
Subpattern 1 (day of month): 12
Subpattern 2 (abbr. month): Jan
Subpattern 3 (year): 2011
Subpattern 4 (visitor host): 157.157.12.206
Subpattern 5 (virtual host): files.hjaltijakobsson.com
Subpattern 6 (incoming bytes): 581
Subpattern 7 (outgoing bytes): 669
Cheers.
Slight tweak on the accepted answer assuming there is actually multiple vhosts on the server (and therefore multiple site.com.access_log 's). This will sort and list each vhost
and for a directory of gzipped logs
Hmm, you could get evil with IPTables and string matching to log the packets for later reporting. Will only work for non SSL connections though.
Or something protocol and session aware like Snort could be shoe horned into use ...
Correct. Filtering the log is a good idea. I also want to get the bandwidth of my Apache server when download files.
Calculates the
%b
and%d
output, which will give you the bandwidth of current.