We support an enterprise application running on Windows Server 2008 R2. One of our customers has chosen to install to VMWare, and what I'm finding is that the VM's are relatively slow compared to hardware. Our product development team has advised that many VMs appear to run particularly slow on I/O benchmarks, which impact performance in production.
I've tried the AttoSoft I/O benchmark and find that for smaller I/O blocks (1-32K) the VM I'm looking at is 25x slower than hardware and for larger I/O blocks (1-8MB) it's 10x slower.
Is this a fair benchmark? If not, any suggestions for a fair test?
You should expect to lose around 5-10% on CPU, memory, network and disk IO performance on an uncontended ESXi system - it's the price we pay for all the wonderful things virtualisation brings. That said what you're seeing is way below these expectations so something is wrong.
Firstly do you know if the entire infrastructure is using compatible equipment? They have a compatibility checker HERE for you to check against, very often we see people on serverfault using non-HCL-approved kit and running into problems. Secondly is the storage actually fit for purpose or are they just using cheapo consumer-grade SATA disks? Have they installed the 'vmtools'? are they doing something within the VM that might slow down IO such as in-VM RAIDing? Is the host contended and struggling to keep up with overall demand for resources? if they're using shared storage is this being overworked?
As you can see there's a lot to check on but you're right that there's something wrong here, those numbers are way off.
It's a fair benchmark of the current state of your client's virtual infrastructure; specifically the limitations of their shared storage. However, those results do not necessarily apply to all virtualized solutions...
It's quite possible to build VMware configurations whose I/O performance is on parity or exceeds that of bare metal hardware. It sounds as though you don't have control over the client's setup. DO you have any details on what type of storage is in place? Perhaps the server specifications and networking details?
Neither benchmarks nor most applications, or even the I/O stack take into account the time jitter that's common on VMs. That leads to huge inefficiencies on some kinds of loads; particularly those IO-bound.
The problem is that some specific parts of the I/O stack (i.e. TCP fairness rate limiting) have to measure and compare very short spans of time; but on a VM, the application usually gets CPU and I/O time in slices. That makes such measurements way too inaccurate. The system interprets it as an erratic I/O and self-rates much lower than necessary.
Modern I/O stacks are slowly getting better at this, especially at longer transfers (because it can then apply some smoothing statistics), but short transfers are greatly impacted.
That might explain the abysmal numbers you get (1/10 - 1/25), but it's very hard to say if that would be the result you'd get from the real application. If the real load is really similar, and uses the same stack as the benchmark, then yes, you could get that kind of performance. But if the application has to do significant processing at the same time as it does I/O, then the bad performance could be amortized with the much better CPU partitioning most VMs have.