Running a LAMP site with a moderate amount of traffic, but which is prone to occasional spikes. Trying to diagnose a bottleneck that causes slowness during periods of higher traffic, I've been looking at the output of mod_status and I'm seeing an alarming disparity between the CPU usage reported there and the CPU usage showing up in all our other monitoring tools, as well as in top and mpstat.
Here's the output of mpstat, which is typical of the server under normal load (it's supposed to be overpowered for what we're doing, which makes the slowdowns all the more frustrating):
Linux 2.6.18-274.7.1.el5 (xxxxxxxxxx) 10/25/2012
03:58:14 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
03:58:14 PM all 2.02 0.01 0.16 0.41 0.01 0.07 0.00 97.32 3398.80
03:58:14 PM 0 0.14 0.01 0.08 0.01 0.00 0.00 0.00 99.76 999.98
03:58:14 PM 1 0.93 0.01 0.13 0.12 0.00 0.03 0.00 98.78 58.25
03:58:14 PM 2 0.79 0.01 0.09 0.07 0.00 0.03 0.00 99.02 57.08
03:58:14 PM 3 1.44 0.01 0.17 0.57 0.00 0.03 0.00 97.78 33.10
03:58:14 PM 4 0.75 0.01 0.07 0.06 0.00 0.03 0.00 99.08 53.83
03:58:14 PM 5 0.36 0.01 0.04 0.09 0.00 0.01 0.00 99.49 8.92
03:58:14 PM 6 0.85 0.01 0.09 0.19 0.00 0.03 0.00 98.82 48.18
03:58:14 PM 7 1.34 0.00 0.13 0.54 0.00 0.03 0.00 97.96 36.87
03:58:14 PM 8 0.15 0.01 0.03 0.01 0.00 0.00 0.00 99.80 0.14
03:58:14 PM 9 0.94 0.00 0.07 0.08 0.00 0.03 0.00 98.87 51.87
03:58:14 PM 10 1.18 0.01 0.16 0.06 0.00 0.03 0.00 98.56 53.50
03:58:14 PM 11 8.35 0.01 0.58 1.55 0.02 0.30 0.00 89.20 375.46
03:58:14 PM 12 1.08 0.01 0.09 0.06 0.00 0.03 0.00 98.72 56.58
03:58:14 PM 13 1.14 0.00 0.10 0.28 0.03 0.13 0.00 98.32 907.31
03:58:14 PM 14 4.73 0.01 0.28 1.08 0.01 0.14 0.00 93.75 198.19
03:58:14 PM 15 8.15 0.01 0.48 1.76 0.02 0.29 0.00 89.30 459.54
And here, at nearly the same time, is apache's mod_status output:
Current Time: Thursday, 25-Oct-2012 15:57:55 EDT
Restart Time: Thursday, 25-Oct-2012 15:40:09 EDT
Parent Server Generation: 3
Server uptime: 17 minutes 46 seconds
Total accesses: 8606 - Total Traffic: 283.4 MB
CPU Usage: u322.08 s32.5 cu0 cs0 - 33.3% CPU load
8.07 requests/sec - 272.3 kB/second - 33.7 kB/request
20 requests currently being processed, 492 idle workers
I don't expect these two to line up exactly, but this seems like a pretty significant difference. I am at a loss as to how apache could be arriving at this number, and documentation appears to be pretty sparse.
Is there some way that apache could be getting capped at some arbitrary level of CPU usage, and mod_status is reflecting this limit? I inherited this server and was not present for the initial setup, so it's entirely possible that some arcane setting has been set to a pathological value for reasons unknown. It would explain a lot, but I don't see anything relevant in the configuration. If that's not the case, is this being calculated in some non-obvious way wherein that number actually makes sense, or is one source or the other just displaying incorrect information?
Any insight would be greatly appreciated.
Should have figured this out sooner, but I think I botched the math the first time I checked. Apparently this is being expressed entirely as an expression of recorded CPU time divided by the clock time since the last server restart.
So in this case, the 322 seconds spent in userspace plus the 32 seconds spent with system calls gives us 354 total CPU seconds. This over the 1066 seconds since last restart gives us that 33% figure.
Obviously this isn't scaling for multiple CPUs, so we're looking at 33% out of a possible 1600%, which seems way more reasonable.
You are right,
mpstat -P ALL
shows load since server boot (checkuptime
), and mod_status shows load for just17 minutes 46 seconds
.