Sometimes my servers will show a high load average in the "top" program (e.g. load is ~10 on a 4-core machine), but the actual CPU usage isn't particularly high.
I assume the issue is that there are many I/O-intensive jobs running. Is there any easy way to identify these jobs that are causing the load, if their "%CPU" values in top aren't that high?
iostat
can report statistics like that. Usually included in your distro in the package sysstat.dstat might also be worth a look, it's a modern replacement.
To find what's causing high load you can check few things.
vmstat -w
will show you ovierwiem (processes, swap, mem, cpu, io, system)pmstat -P ALL
will provide you statistics (with %iowait) per cpu coreiostat -x
look for high %util or long await or big average queue sizeiotop
ps -ax
look for state D which is uninterruptible sleep (usually IO), run it one more time check if they are still in D statesar -b
- overall io activitiessar -d
- individual block device io activitiesIf you have IO accounting in your kernel, then you can use
iotop
to give information like that. Also, monitoring tools like collectd can record and report on the data.