So... I see some very weird load issues on our e-mail server. It starts spiking around 8-9am (coincidentally that's when people start working), but it goes down around 11am or so. CPU usage remains normal, I have plenty of free memory, no swapping. Yesterday we had a really high iowait% (49.75) with a really high load (40), today we 'only' had a load of 11-12 with iowait% being between 3-4 tops.
All signs point to imapd as the culprit (courier-imap), because when I stopped it, load suddenly started to lower, and within 2-3 minutes, it was back to normal. I did have about 40-60 of them running. We use thunderbird, which opens 5 connections each, I lowered it to 1 on most workstations, it helped a bit (load went down to 5-7), then... the whole server went back to normal around 11am.
I still have ~30 imapds running, but with perfectly normal load (between 0.2 and 0.4). So... I don't really understand why is this happening, because, by logic, it should be much higher, if that would be the cause of the issue.
It's a Linode 1080 VPS with 1gig ram.
(chkrootkit / rkhunter showed nothing unusual.)
If you are using a VPS, you are sharing IO bandwidth, CPU time, and memory bandwidth with other users that are not visible to your VPS.
I would be confident to say that another domU hosted on the physical machine is consuming a large amount of one or more of those resources (most likely IO).
If you use
iostat -x
you'll probably see that your service times are fluctuating wildly, which will explain why your load average is spiking, due to processes blocking on disk IO.As I understand it, load on a *nix system means "the number of processes waiting to run". This does not necessarily mean they are waiting for the CPU. They could be waiting for disk access, or for a network connection to complete.
For example, I used to manage a system where the load began to rocket over 80 occasionally bringing the system to a crawl. It ended up being because an external LDAP server had malfunctioned, which the local system was doing authentication requests for clients against.
I would look for network dependencies that your applications have as a possible culprit for unusually high load readings, if your CPU and iowait seem OK.
Like the first poster indicated, it's likely IO. I actually have the same setup on my vserver and often see the same problems. The issue is that current container methods of virtual servers like vserver do not separate IO effectively. Here's a whitepaper that explains it in depth on page 13 if you're interested. http://www.cs.princeton.edu/~mef/research/vserver/paper.pdf