I was doing some simple hand benchmarking on our (live) database server during non-peak hours, and I noticed that queries returned somewhat erratic benchmark results.
I had enabled the "Balanced" power saving plan on all our servers a while ago, because I figured they were nowhere near high utilization and this way we could save some energy.
I had assumed this would have no significant, measurable impact on performance. However, if CPU power saving features are impacting typical performance -- particularly on the shared database server -- then I am not sure it's worth it!
I was a little surprised that our web tier, even when at 35-40% load, is down-clocking from 2.8 Ghz @ 1.25V to 2.0 Ghz @ 1.15V.
I fully expect the down-clocking to save power, but that load level seems high enough to me that it should be kicking up to full clock speed.
Our 8-cpu database server has a ton of traffic, but extremely low CPU utilization (just due to the nature of our SQL queries -- lots of them, but really simple queries). It's usually sitting at 10% or less. So I expect it was downclocking even more than the above screenshot. Anyway, when I turned power management to "high performance" I saw my simple SQL query benchmark improve by about 20%, and become very consistent from run to run.
I guess I was thinking that power management on lightly loaded servers was win-win -- no performance loss, and significant power savings because the CPU is commonly the #1 or #2 consumer of power in most servers. That does not appear to be the case; you will give up some performance with CPU power management enabled, unless your server is always under so much load that the power management has effectively turned itself off. This result surprised me.
Does anyone have any other experience or recommendations to share on CPU power management for servers? Is it something you turn on or off on your servers? Have you measured much power are you saving? Have you benchmarked with it on and off?
I'm not sure about servers, but the current thinking in embedded devices is not to bother with steps between low-power and flat-out because the extra time involved will eat your power savings, so basically they run low power until they get any real amount of cpu load at which point they flip over to fastest-possible so they can finish the job and get back to idling at low power.
I have always turned off any type of power management on servers. I am curious to what others have experienced, but I always assumed that if the server is under-clocking, there will always be some delay to 'step up' the CPU to 100%, and in a data-center setting any delay like this is unacceptable.
The data you provided seems to support this assumption. So, I have not done any specific testing but it would seem that you should not use any power-saving technology within Windows or the BIOS. I even turn off the 'shut off Network' and PCI card settings to be ultra conservative.
How Much Power will this actually Save you:
If you do decide that this feature might put the stability of your servers at risk (I have on experience with this), then you might look elsewhere for the energy savings.
I would try to find out just how much energy this might save for the amount of servers you have (Although perhaps you already did this). Since the graph you posted in your answer is percentages, for your company, the savings might actually be very little actual power. If you don't have many servers, it might not actually be that much, and getting motion activated lights or something like that in your office might save more energy (even though that is not as marketable).
I remember reading a few years back about one of the major American car companies (forget which) having pressure to change the emissions of the exhaust on their cars. Instead, the company showed that if it capped some of its factories, that would be much cheaper for them as well as resulting in far more emissions savings.
Don't Forget Disks:
Also, you might want to check that these power savings feature don't spin down the disk(s) if they are not used. Maybe for a little while all the SQL query results would be in RAM, the disk would be used and go to sleep (Not sure if it works like that though)? If this can happen, there would be a big performance penalty while everything spins up again.
Preface: I'm making some leaps/generalizations about Intel Xeons and their power saving performace with SpeedStep. In reading about the Intel Xeon "Yorkfield" 45nm CPUs, Enhanced Intel SpeedStep Technology (EIST) and Enhanced Halt State (C1E) seem to be the real culprit of the situation. I would agree with your statment in believing that turning on such power management features would aid the conservation of energy but when the CPUs needed the energy under load that the system would return to a normal voltage clock speed settings. It appears that EIST and C1E have some side effects that aren't intuitively implied when using either/or option in the BIOS. After crawling through numerous overclocking websites, it appears that these two settings in the BIOS cause quite a bit of frustration.
From http://www.overclock.net/intel-cpus/376099-speedstep-guide-why-does-my-processor.html:
While adjusting your performance settings for "high performance" is probably the best setting for a database server, I'm fairly certain either EIST and/or C1E caused the CPUs to under perform even though they should have gone back to normal settings when the load increased substantially. The big caveat to me appears to be "what is a substantial load?" According to the overclockers.net site they claim that EIST uses those "power schemes" settings for how to manipulate your CPU settings. But there's no indication of percentage of load or for how long to know when to turn the CPUs back to normal voltage.
Again, I'm by no means an expert on the subject matter for Intel CPUs but I would wager that adjusting these two settings might get you the power savings you want and the performance you should get, but sticking with the "maximum performance" setting is just as effective without the need to reboot.
The fast answer is: Of course power saving will affect performance.
The longer answer is no fun. Basically, try a setting, test performance, and decide what you can live with.
Applications and systems are so very complicated that there is no cut and dry answer here, other than "yes, reaction time and other system speeds will be affected." If it is that much slower than the hard drive, or the network -- well, you get the idea. Test in reality.
I always try to VM as many servers as I can but where I have to 'bare-metal' a server it's usually as I need or want totally consistent performance. So for these business-critical machines I NEVER switch on anything power-saving related whatsoever for exactly the reasons you're experiencing.
***bang-goes-my-green-credentials*
A few things:
Check in the BIOS to make sure that power management is under OS control. It could be possible that it's set to be managed by the firmware, and therefore using dumb, unoptimal processor power management.
Check to see if there are any power management-related hotfixes that you might be missing. There were quite a few notable ones back in the day when Vista/Server 2008 came out.
Check the detailed configuration for Balanced. It's possible that another power saving feature is causing the reduced performance. In theory, the performance hit from EIST should be negligible, though then again, an SQL database has a unique footprint, and it's conceivable that it gets disproportionately affected by processor power management.
Some information from Microsoft (Word Doc format, unfortunately)
Improve Energy Efficiency and Manage Power Consumption with Windows Server 2008 R2
These particular hardware-level CPU power saving features are the same under any OS of course, it's just a question of whether or not you turn them on.
The power savings graph of no CPU power management, versus CPU power management:
We're clear that (and this graph shows that) at high utilization levels, CPU power management is automatically turned off. What I'm not clear on, however, is whether at low utilization levels there is impact to overall server performance, e.g. turnaround time on simple-ish SQL Server queries.
You should never, ever resort to using the Windows settings or the Bios Speedstep which comes on Intel processors and there's also an AMD equivalent. These can cause issues, and I've seen such issues where with Speedstep the CPU clock would keep bouncing up and down erratically even though the CPU resource usage was consistent.
If you want to be greener and save power, use low power processors, designated with the L character before the model name, such as L54XX series and L55XX series from Intel.
EDIT: I'm sorry if I gave the impression that this feature will always fail, I've just been burned by it, and in a mission critical system I can't have this sort of stuff happen, so I just try to stay away from it.
When you're talking about performance on a server, there are a few different ways of looking at it. There's the apparent response time (similar to network latency) and the throughput (similar to network bandwidth).
Some versions of Windows Server ship with Balanced Power settings enabled by default. As Jeff pointed out. Windows 2008 R2 is one of them. Very few CPU's these days are single core so this explanation applies to almost every Windows server you will run into with the exception of single-core VM's. (more on those later).
When the Balanced power plan is active, the CPU attempts to throttle back how much power it's using. The way it does this is by disabling half of the CPU cores in a process known as "parking". Only half of the CPU's will be available at a time so it uses less power during times of low traffic. This isn't a problem in and of itself.
What IS a problem is the fact that when CPU's are unparked, you've doubled the available CPU cycles available to the system and suddenly unbalanced the load on the system, taking it from (for example) 70% utilization to 35% utilization. The system looks at that and after the burst of traffic is processed, it thinks "Hey, I should dial this back a bit to conserve power". And so it does.
Here's the bad part. In order to prevent an uneven distribution of heat & power within the CPU cores, it has a tendency to park the CPU's that haven't been parked recently. And in order for that to function properly, the CPU needs to flush everything from the CPU registers (L1, L2 & L3 cache) to some other location (most likely main memory).
As a hypothetical example, let's say you have an 8 core CPU with C1-C8.
When this happens, all of them become active for some period of time, and then the system will park them as follows:
But in doing so, there's a good amount of overhead associated with flushing all of the data from the L1-L3 cache to make this happen so that weird errors don't happen to programs that were flushed from the CPU pipeline.
There's likely an official name for it, but I like to explain it as CPU thrashing. Basically the processors are spending more time doing busy work moving data around internally than they are fielding work requests.
If you have any kind of application that needs low latency for its requests, you need to disable the Balanced Power settings. If you're not sure if this is a problem, do the following:
If you see any of them getting parked, you'll notice that half of them are parked at any given time, they'll all fire up, and then the other half get parked. It alternates back and forth. Thus, the system CPU's are thrashing.
Virtual Machines: This problem is even worse when you're running a virtual machine because there's the additional overhead of the hypervisor. Generally speaking, in order for a VM to run, the hardware needs to have a slot in time available for each of the cores at each timeslice.
If you have a 16 core piece of hardware, you can run VM's using more than 16 total cores but for each timeslice, only up to 16 virtual CPU's will be eligible for that time slice and the hypervisor must fit all of the cores for a VM into that timeslice. It can't be spread out over multiple timeslices. (A timeslice is essentially a set of X CPU cycles. It might be 1000 or it might be 100k cycles)
Ex: 16 core hardware with 8 VM's. 6 have 4 virtual CPU's(4C) and 2 have 8 virtual CPU's(8C).
Timeslice 1: 4x4C Timeslice 2: 2x8C Timeslice 3: 2x4C + 1x8C Timeslice 4: 1x8C + 2x4C
What the hypervisor cannot do is split half of the allotment for a timeslice to the first 4 CPU's of an 8 vCPU VM and then on the next timeslice, give the rest to the other 4 vCPU's of that VM. It's all or nothing within a timeslice.
If you're using Microsoft's Hyper-V, the power control settings could be enabled in the host OS, meaning it will propagate down to the client systems, thus impacting them as well.
Once you see how this works, it's easy to see how using Balanced Power Control settings causes performance problems and sluggish servers. One of the underlying issues is that the incoming request needs to wait for the CPU parking/unparking process to complete before the server is going to be able to respond to the incoming request, whether that's a database query, a web server request or anything else.
Sometimes, the system will park or unpark CPU's in the middle of a request. In these cases, the request will start into the CPU pipeline, get dumped out of it, and then a different CPU core will pick up the process from there. If it's a hefty enough request, this might happen several times throughout the course of the request, changing what should have been a 5 second database query to a 15 second database query.
The biggest thing you're going to see from using Balanced Power is that the systems are going to feel slower to respond to just about every request you make.