I like to fully-load our compute hardware to reduce wasted CPU time, and on typical in-house hardware this is fairly easy: load the machine with as many runnable threads as there are cores, and idle time will go to zero.
Here is an example app:
public class Looper
{
public static void main(String[] args)
{
while (true) { new java.util.Random().nextBytes(new byte[4096]); }
}
}
On our in-house, 8-core hardware, I can run 8 of these and idle time (as reported by mpstat
and top
) goes to zero. I can even add a 9th, 10th, etc process, and idle time stays very close to zero.
On EC2 (c1.xlarge instances), however, idle time is much higher than I'd expect. At 8 processes, idle time hovers around %1, and with 9, 10, etc processes, it can increase to 2%-3% or higher. With more complicated programs (not the example above), idle time can be even higher than that.
Can anyone explain this? This is with very recent Amazon kernels, and does not include stolen CPU time, which I would expect to see on EC2. Is this a problem with EC2 in particular, or is it general to Xen? Are there known workarounds?
Commonly with EC2 the idle and steal values appear higher than you would see on bare metal. This is normal on EC2 due to how the virtualization works. You are probably not losing available CPU time in this case as it's just an artifact of how the system functions. Make sure when you're checking the CPU utilization you're using a Xen-aware version of the tool that understands how to identify CPU time on a Xen-based VM.