If a server is experiencing high load, I use top and similar tools to troubleshoot why. However, this is only effective if I can analyze while the server is experiencing the problem.
What are some good tools for finding root cause of high server load in previous times? For example I was planning to put in a cron job to save 'top' output, apache server stats, mysql process list, etc every 5 minutes. But that doesn't seem very elegant, wondering if someone has come up with some utilities to accomplish this already.
For ongoing monitoring you could consider installing
munin
. It will gather information every 5 minutes and generates graphs that will allow you to see where the bottlenecks are. I also usesar
which can be run in background mode gathering data to disk. This gives quite detailed infomation on what the bottleneck is. To what processes where running in the past you will need the process accounting package.I like collectd but I've recently started toying with pcp (performance co-pilot). It has some nice features for historical diagnosis. [1]: http://oss.sgi.com/projects/pcp/
Your non-elegant solution is actually a good one without setting up separate monitoring consoles (think SNMP traps). If you're running a RHEL/CentOS style syste make sure you've installed 'sysstat' (and turned it on) to gather ongoing stats about CPU, Memory, Disk I/O and the like. (see /etc/sysconfig/sysstat.* config files to tune).
Once you have that gathering underlying stats for you it can be used to pinpoint when the load trend occurs (so just besides seeing high CPU, is your proc queue backed up? do you see major faults in paging? how's your swap utilization?) which you can then correlate to your 'mysqladmin proc stat' type lists and so forth. If it's a LAMP stack, grab the total httpd processes and then do a quick sum/divide to find out the average process size to record as well. Enable your slow query log in MySQL to then trap those bad boys and look for some tables needing indexes.
Sometimes lo-tech isn't bad tech. :) Why use a chainsaw when a knife will do.
Might also wanna look at collectd, as a munin alternative.
atop.
It complements top/htop, because it can collect stats over time.