I own a t1.micro EC2 instance. After release upgrade of Ubuntu to saucy I got Apache 2.4.6 and I started noticing 100% CPU load like this
top - 19:37:58 up 2:55, 2 users, load average: 3.90, 2.90, 1.82
Tasks: 95 total, 4 running, 91 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.9 us, 7.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 88.9 st
KiB Mem: 629976 total, 588412 used, 41564 free, 39412 buffers
KiB Swap: 2097144 total, 0 used, 2097144 free, 326932 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6688 www-data 20 0 114m 17m 12m R 34.3 2.9 3:07.74 apache2
6719 www-data 20 0 113m 10m 6052 R 33.3 1.8 3:16.99 apache2
6721 www-data 20 0 113m 9.8m 5004 R 32.0 1.6 3:04.13 apache2
I decided to install another Apache version from this ppa, so I got 2.4.9 now, but the problem is still there.
What I tried: 1. removed all modules and added only necessary ones. At the moment the loaded ones are:
lrwxrwxrwx 1 root root 36 Apr 9 19:16 access_compat.load -> ../mods-available/access_compat.load
lrwxrwxrwx 1 root root 28 Apr 9 19:17 alias.conf -> ../mods-available/alias.conf
lrwxrwxrwx 1 root root 28 Apr 9 19:17 alias.load -> ../mods-available/alias.load
lrwxrwxrwx 1 root root 33 Apr 9 19:18 authz_core.load -> ../mods-available/authz_core.load
lrwxrwxrwx 1 root root 38 Apr 9 19:18 authz_groupfile.load -> ../mods-available/authz_groupfile.load
lrwxrwxrwx 1 root root 33 Apr 9 19:18 authz_host.load -> ../mods-available/authz_host.load
lrwxrwxrwx 1 root root 33 Apr 9 19:18 authz_user.load -> ../mods-available/authz_user.load
lrwxrwxrwx 1 root root 26 Apr 9 19:18 dir.conf -> ../mods-available/dir.conf
lrwxrwxrwx 1 root root 26 Apr 9 19:18 dir.load -> ../mods-available/dir.load
lrwxrwxrwx 1 root root 27 Apr 9 19:19 mime.conf -> ../mods-available/mime.conf
lrwxrwxrwx 1 root root 27 Apr 9 19:19 mime.load -> ../mods-available/mime.load
lrwxrwxrwx 1 root root 34 Apr 9 19:11 mpm_prefork.conf -> ../mods-available/mpm_prefork.conf
lrwxrwxrwx 1 root root 34 Apr 9 19:11 mpm_prefork.load -> ../mods-available/mpm_prefork.load
lrwxrwxrwx 1 root root 27 Apr 9 14:39 php5.conf -> ../mods-available/php5.conf
lrwxrwxrwx 1 root root 27 Apr 9 14:39 php5.load -> ../mods-available/php5.load
lrwxrwxrwx 1 root root 30 Apr 9 19:19 rewrite.load -> ../mods-available/rewrite.load
lrwxrwxrwx 1 root root 29 Apr 9 19:20 status.conf -> ../mods-available/status.conf
lrwxrwxrwx 1 root root 29 Apr 9 19:20 status.load -> ../mods-available/status.load
reinstalling Apache to a newer version (see above, I'm on 2.4.9 now and still it is there)
stopping and starting the instance (hoping it would get onto another hardware) - no luck
The site I'm testing with is WP-based with W3TC plugin installed, I disabled minification of W3TC static files. I decide to do so because I noticed that it stucks sometimes on minified files in status module.
upgrading WP to the latest version (3.8.2) - nothing changed
Looking now at status module output, I'm seeing that there is no pattern as to what requests are stuck - absolute random. For example: http://pastebin.com/JxLbbzCB - see that POST /wp-admin/admin-ajax.php has been stuck for 62 seconds in W (sending reply), 5-0 request is also in W status for 250 seconds.
All these "W" workers load CPU by 100% and last really long causing EC2 instance to start CPU throttle.
UPDATE: here is strace info for one of hanged processes:
select(17, [16], [16], NULL, {1, 0}) = 1 (out [16], left {0, 999998})
send(16, "", 0, MSG_NOSIGNAL) = 0
select(17, [16], [16], NULL, {1, 0}) = 1 (out [16], left {0, 999998})
send(16, "", 0, MSG_NOSIGNAL) = 0
select(17, [16], [16], NULL, {1, 0}) = 1 (out [16], left {0, 999998})
send(16, "", 0, MSG_NOSIGNAL) = 0
select(17, [16], [16], NULL, {1, 0}) = 1 (out [16], left {0, 999998})
send(16, "", 0, MSG_NOSIGNAL) = 0
select(17, [16], [16], NULL, {1, 0}) = 1 (out [16], left {0, 999998})
send(16, "", 0, MSG_NOSIGNAL) = 0
select(17, [16], [16], NULL, {1, 0}) = 1 (out [16], left {0, 999997})
send(16, "", 0, MSG_NOSIGNAL) = 0
select(17, [16], [16], NULL, {1, 0}) = 1 (out [16], left {0, 999998})
send(16, "", 0, MSG_NOSIGNAL) = 0
select(17, [16], [16], NULL, {1, 0}) = 1 (out [16], left {0, 999998})
send(16, "", 0, MSG_NOSIGNAL) = 0
select(17, [16], [16], NULL, {1, 0}) = 1 (out [16], left {0, 999998})
and it lasts while the process is not killed
UPDATE: output of ps auxf|grep apache2 :
root@domU-12-31-39-02-26-E9:~# ps auxf|grep apache2
root 2761 0.0 0.1 4168 852 pts/1 S+ 19:52 0:00 \_ grep --color=auto apache2
root 2549 0.0 2.3 115720 15104 ? Ss 19:50 0:00 /usr/sbin/apache2 -k start
www-data 2554 0.0 0.8 115800 5660 ? S 19:50 0:00 \_ /usr/sbin/apache2 -k start
www-data 2555 0.2 3.6 117944 22872 ? S 19:50 0:00 \_ /usr/sbin/apache2 -k start
www-data 2556 0.2 3.7 119252 23360 ? S 19:50 0:00 \_ /usr/sbin/apache2 -k start
www-data 2557 9.9 1.2 115932 8068 ? R 19:50 0:14 \_ /usr/sbin/apache2 -k start
www-data 2558 9.7 1.2 115932 8068 ? R 19:50 0:14 \_ /usr/sbin/apache2 -k start
www-data 2562 0.0 0.8 115800 5660 ? S 19:50 0:00 \_ /usr/sbin/apache2 -k start
www-data 2564 0.0 0.8 115800 5656 ? S 19:50 0:00 \_ /usr/sbin/apache2 -k start
www-data 2566 20.7 1.2 115932 8044 ? R 19:50 0:28 \_ /usr/sbin/apache2 -k start
www-data 2567 35.9 1.2 115932 8072 ? R 19:50 0:49 \_ /usr/sbin/apache2 -k start
www-data 2568 10.8 1.2 115932 8080 ? R 19:50 0:14 \_ /usr/sbin/apache2 -k start
www-data 2571 0.0 0.8 115800 5644 ? S 19:51 0:00 \_ /usr/sbin/apache2 -k start
www-data 2572 0.0 0.8 115800 5644 ? S 19:51 0:00 \_ /usr/sbin/apache2 -k start
www-data 2573 0.0 0.8 115800 5644 ? S 19:51 0:00 \_ /usr/sbin/apache2 -k start
www-data 2574 0.0 0.7 115752 4900 ? S 19:51 0:00 \_ /usr/sbin/apache2 -k start
root@domU-12-31-39-02-26-E9:~#
Output of tail /var/log/apache2/error.log (error logs configured for other vhosts are empty at the moment):
[Mon Apr 21 19:50:22.201343 2014] [:notice] [pid 2552] FastCGI: process manager initialized (pid 2552)
[Mon Apr 21 19:50:22.692477 2014] [mpm_prefork:notice] [pid 2549] AH00163: Apache/2.4.9 (Ubuntu) mod_fastcgi/mod_fastcgi-SNAP-0910052141 PHP/5.5.3-1ubuntu2.3 configured -- resuming normal operations
[Mon Apr 21 19:50:22.692580 2014] [core:notice] [pid 2549] AH00094: Command line: '/usr/sbin/apache2'
free -m
total used free shared buffers cached
Mem: 615 596 18 0 217 142
-/+ buffers/cache: 236 378
Swap: 2047 22 2025
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 30G 22G 6.1G 79% /
devtmpfs 299M 12K 299M 1% /dev
none 62M 208K 62M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 308M 0 308M 0% /run/shm
none 100M 0 100M 0% /run/user
Heavy usage Apache 2.4 on a t1.mirco instance is like humpty dumpty sitting on a couple toothpicks. He's too big and heavy! So of course he's squashed the toothpicks to pieces and rolled down the hill.
You have at least two basic solutions to the root cause of this problem:
Most likely the problem is memory shortage for apache. Apache doesn't have enough memory to to complete properly and thus get into unpredictable state resulting in the CPU loop.
I recommend to reduce the memory footprint for your VM (or move to a larger VM). Find the right setting for MaxClients by experimenting. Start with minimum setting and if this fixes CPU issue, increase MaxClients and MaxSpareServers until you find the balance.
Start with this (assuming you use prefork) and see if the issue persists: