I have 2 nearly identical dedicated servers with the same provider. They also run a nearly identical software stack: RedHat 5 64-bit, Plesk, PHP, Apache, & MySQL. We use them for hosting custom sites we build.
The problem is, while our 1st server has a load average (in top) of around 0.3, the 2nd server consistently has a load average of around 4.0 or higher. Basic functions in Plesk are delayed and there is a bit of latency when executing shell commands.
Anyone have ideas why it would be so high? And why it would differ from our other server so much?
Here is my current top output (sorted by %MEM) ...
Any help is much appreciated ...
top - 21:48:04 up 100 days, 4:28, 1 user, load average: 3.74, 4.20, 4.23
Tasks: 336 total, 1 running, 335 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.8%us, 0.4%sy, 0.0%ni, 91.3%id, 7.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 12290884k total, 11886452k used, 404432k free, 2920212k buffers
Swap: 2096472k total, 244k used, 2096228k free, 6560692k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22536 apache 15 0 860m 547m 6484 S 0.0 4.6 0:10.96 httpd
26467 apache 15 0 859m 546m 6408 S 0.0 4.5 0:07.67 httpd
3620 apache 15 0 859m 545m 5552 S 0.0 4.5 0:06.15 httpd
1895 apache 15 0 858m 544m 6356 S 0.0 4.5 0:08.25 httpd
16933 apache 15 0 858m 544m 5488 S 0.0 4.5 0:01.57 httpd
6431 apache 15 0 856m 542m 6076 S 10.6 4.5 0:05.32 httpd
14417 apache 15 0 856m 542m 5568 S 0.0 4.5 0:03.88 httpd
15403 apache 15 0 855m 541m 5616 S 0.0 4.5 0:03.73 httpd
19165 apache 15 0 853m 539m 6252 S 0.0 4.5 0:12.40 httpd
15898 apache 15 0 852m 539m 5376 S 0.0 4.5 0:02.68 httpd
14401 apache 15 0 851m 538m 5460 S 0.0 4.5 0:02.97 httpd
15393 apache 15 0 851m 538m 5404 S 0.0 4.5 0:03.12 httpd
15427 apache 15 0 851m 538m 5496 S 0.0 4.5 0:02.44 httpd
14412 apache 15 0 851m 538m 5324 S 0.0 4.5 0:02.15 httpd
18330 apache 15 0 851m 537m 5136 S 0.0 4.5 0:01.30 httpd
18303 apache 15 0 848m 535m 5140 S 0.0 4.5 0:00.47 httpd
21190 apache 15 0 845m 533m 3988 S 0.0 4.4 0:00.33 httpd
15923 root 18 0 822m 521m 9928 S 0.0 4.3 10:04.81 httpd
22021 apache 15 0 828m 520m 4964 S 0.0 4.3 0:00.16 httpd
22146 apache 15 0 823m 515m 3016 S 0.0 4.3 0:00.02 httpd
22345 apache 15 0 822m 514m 2408 S 0.0 4.3 0:00.00 httpd
14721 apache 15 0 733m 510m 488 S 0.0 4.3 0:00.00 httpd
5094 root 15 0 1452m 122m 15m S 1.0 1.0 852:24.24 java
4636 mysql 15 0 532m 57m 6440 S 1.0 0.5 488:05.84 mysqld
4799 popuser 15 0 166m 53m 2368 S 0.0 0.4 0:36.64 spamd
16761 popuser 15 0 159m 46m 2312 S 0.0 0.4 0:00.38 spamd
4797 root 15 0 158m 45m 2448 S 0.0 0.4 0:01.27 spamd
5074 root 34 19 255m 20m 2144 S 0.0 0.2 1:37.53 yum-updatesd
9917 named 15 0 366m 9804 1980 S 0.0 0.1 0:10.26 named
4332 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.06 sw-engine-cgi
4341 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.07 sw-engine-cgi
4350 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.09 sw-engine-cgi
4352 sso 18 0 119m 8028 5212 S 0.0 0.1 0:00.11 sw-engine-cgi
4376 ntp 15 0 23388 5020 3896 S 0.0 0.0 0:00.58 ntpd
4331 sw-cp-se 15 0 61336 4572 1480 S 0.0 0.0 5:53.22 sw-cp-serverd
4213 haldaemo 15 0 31252 4460 1684 S 0.0 0.0 0:01.52 hald
4778 postgres 18 0 117m 4164 3484 S 0.0 0.0 0:00.11 postmaster
18555 root 16 0 98.3m 3716 2852 S 0.0 0.0 0:00.01 sshd
4488 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi
4489 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi
4492 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi
4493 sso 18 0 119m 3044 224 S 0.0 0.0 0:00.00 sw-engine-cgi
4490 sso 18 0 119m 3040 220 S 0.0 0.0 0:00.00 sw-engine-cgi
run something to track server behavior over time - for instance sar or even better munin. see how high-load correlates with other parameters.
problem might be related to your io system. maybe you have heavy writes on the machine with higher load? or maybe you have degraded raid? or hard drive is about to die? or raid controller has disabled cache? or something uses more memory and system needs to swap?