I've been experiencing a spike in load average on a web server I manage on almost a daily basis now, here are the server specs:
- 6 x 2.4 GHz dedicated CPU
- 3GB RAM
This is a VPS of which is running debian 6, I installed apache, php and mysql via apt
. I'm not sure if there is a configuration that I've gotten wrong.
Today the load average peaked so high the server failed to serve the web application (WordPress). The screenshot below shows our server monitoring system. You'll notice the high load average correlates to a high apache busy worker count, and subsequently the memory maxes out too.
After forcing a reboot on the server I still have a higher than usual load average, despite the CPU usage being low. The following screenshots show htop
and then iotop
.
The load average is now > 6, here's what the apache server status says:
I'm really struggling with how to investigate this. Can anyone assist in figuring this one out.
Update 1
I've search the apache error logs and no word of anything hitting max execution time. I do, however, get a lot of the following... starting just as the server started to load up:
::1 - - [24/Feb/2014:15:03:31 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
::1 - - [24/Feb/2014:15:03:32 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
::1 - - [24/Feb/2014:15:03:33 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
::1 - - [24/Feb/2014:15:03:34 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
::1 - - [24/Feb/2014:15:03:35 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
::1 - - [24/Feb/2014:15:03:36 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
::1 - - [24/Feb/2014:15:03:37 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
::1 - - [24/Feb/2014:15:03:38 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
::1 - - [24/Feb/2014:15:03:39 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
::1 - - [24/Feb/2014:15:03:41 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.16 (Debian) (internal dummy connection)"
Note how they're all about 1 second after the previous... perhaps this is something.
Update 2
So I had the server host move the VPS to a new hypervisor, however afterwards it's still has quite a high iowait
. I ran iostat 1
and this is what I recieved:
Does this help identify the problem?
It looks like you've got a script somewhere that is causing the load.
Start by going through your apache error log and looking for max_execution times or timeouts. Move on to the access logs and look for scripts that are being accessed which may be causing the hanging.
There are a few things you can do to investigate the problem, including using vmstat 2 (for example - this will display the output of key resources every 2 seconds).
One thing which jumps out at me though is the amount of swap being used - 841MB on a server with 3 gigs is very substantial. I suspect that your system is swapping, causing IO to go very high and pushing up the load etc. If this hypothesis is correct, the solution is to deal with the swapping.
You either need to throw more memory at the system or change how swap is handled or both. I'd suggest starting with the latter - its easy to do just configure swappiness. To do this type echo 'vm.swappiness = 10' >> /etc/sysctl.conf and then "sysctl -p". This will make the CPU do more work but swap less. On many VM's, disk IO is a bottleneck, so the affects are pretty instant and remarkable.
Throwing more RAM will also reduce the amount of swapping and speed up the system.