I'm having load problems with my server and even though I'm a somewhat experienced Linux admin I'm out of ideas now.
The problem is a slowly but steadily increasing load on the server without any apparent cause.
The Server is a AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ with 6GB RAM. It is running Debian Stable with Linux gir 2.6.26-2-amd64 #1 SMP Wed Aug 19 22:33:18 UTC 2009 x86_64 GNU/Linux.
The server basically runs Lighttpd, several FastCGI PHP processes and a MySQL database. Typical webserver tasks.
The CPU is never really fully used up and memory is mainly used for buffers and cache which is fine. I tried to restart the various services to see if one of them would decrease the load again, but without luck.
Here are graphics showing load, CPU and IOStat:
So, question is: What could cause a slowly but ever increasing load? And how do I find out what's responsible?
Update: I forgot to mention, when I reboot the server, the load will be down to around 0.3 to 0.6 and will start to climb up again slowly over the next weeks.
Each zombie process adds 1.0 to the load. You might be seeing an accumulation of zombies.
I found an excellent hint in answer to a different question.
Looking for processes in state 'D' shows four PHP processes that seem to hang for quite a while corresponding to the "steps" in the load curve:
So these seem to be the problem. I now need to find out while those processes hang and how to fix it. Thanks everyone.
My guess is that server is IO starved , maybe you should add the iotop stats to the graphs
I wonder if you can have an per application io activity that is also a factor for server load
http://rt.wiki.kernel.org/index.php/I/Otop_utility
other tool is dstat
If it were I/O, then he would see the iowait (pink) on the cpu graphs.
This kind of problems often came from the harddisk that is not enough fast to serve datas required by the MySQL database and the HTTP server. You should look at iostat command
In general, it's actually not a bad thing to have a high server load; it means that you're not sitting idle and doing less than you otherwise could. 80%-90% load of your total capacity (with some "burst" room) is what's usually sought after. I'd recommend checking the output of mpstat and vmstat. In particular, the first 2 numbers from vmstat can give you more meaningful info about how "backed up" you are in terms of processes in the run queue. The last column ("wa") of vmstat output can tell you if, and for how long, you're waiting for I/O completions. The run queue size and the I/O wait time are often correlated. Also check out sar (from the sysstat package): that gives you a detailed view of what's going on over a period of time; the metrics that it records are very thorough.