I have an apache webserver running many VirtualHosts.
Recently it has been bogging down and becoming unresponsive, and I'm wondering how I can determine which VirtualHosts are causing most of the issue. We have had occasions in the past where a bug in the code of an individual site has taken down the whole server. My goal is to be able to diagnose these instances quickly.
I am monitoring the server with munin and notice that the number of apache processes, memory usage, and load tend to be very high during the periods in question. Problem is, these statistics are for the whole webserver, not for individual VirtualHosts.
I have written a script to parse the weblogs for traffic per VirtualHost, but it is appearing that that is not enough. I probably need to determine how many apache processes each VirtualHost is responsible for, or how long they hold each process open - or perhaps how much memory usage each is responsible for.
Where can I find this information? I don't mind writing a script to track this data, but I don't know exactly where to extract it from in the first place.
I appreciate that it doesn't always suit to have mod_status available and on all of the time, but it and apachetop are the best ways to diagnose these problems. However there are many ways to skin a cat.
This trick is useful in a number of circumstances and isn't just Apache specific. It does depend on a number of factors however, and you need to know what it's doing to know it's limitations.
Let's break it down:
There are two major caveats to that trick:
1) If something running under the same context as the Apache process does a chdir()'s outside of the VirtualHost directory, you'd be hard pushed to find that out.
e.g. a PHP script running under mod_php (a CGI will be different as Apache fork's a separate process, but I'm presuming CGI's aren't a problem or you'd be able to track them easier).
2) If you have Apache instances which are very very quickly serving pages (e.g. a small static HTML page). This normally isn't a problem, but it may be possible. If you're getting a lot of "No such file or directory" errors, this is basically a manifestation of it. I would expect some, but not the majority unless they fit this particular case. Basically this is because the Apache processes you've scanned with ps have already exited by the time you've checked /proc. Obviously this means they are serving pages very very quickly.
Regarding memory bound Apache processes, I use ps_mem.py to calculate memory usage on my webservers. If you've got large Apache (in terms of resident memory size) processes and they are exiting quickly, that is roughly the equivalent of asking a big fat guy to keep running 100m sprints. If your webserver isn't a shared one, those "No such file or directory" errors are normally good candidates to move some content onto a smaller lightweight webserver (e.g. nginx / lighttpd) or start heavily caching content (e.g. varnish / squid).
I think you want apachetop, or else
mod_status
(withExtendedStatus On
). I'm yet to have a performance problem in Apache that wasn't lit up bymod_status
, and apachetop looks like a neat tool (that has some annoying limitations in log layout).