In this talk at Berkeley, Jeff Dean's slides contain the following note, about a third of the way through the presentation:
Non-intuitive: remove capacity under load to improve latency (?!)
The intuition I have is that less load on the box would reduce latency, but this note makes it seem the opposite is true, for Google at least. Why would this be the case?
Imagine two assembly lines:
One has two machines that each do half the job. Each machine takes about ten minutes. So a product comes off the line every ten minutes and the average item takes twenty minutes to go through the line.
Another has twenty machines that each do one-twentieth of the job. Each machine takes about two minutes. So a product comes off the line every two minutes and the average item takes forty minutes to go through the line.
The second assembly line has five times the capacity. But it has double the latency.
Now consider if the average machine is broken down for a few seconds about one tenth of the time. The second assembly line will see more latency spikes because you have twenty chances to hit one.