I've been looking at some performance issues on a clustered virtual machine in our organisation. Actually this problem seems to affect most of the virtual machines I have looked at. Both host and VM are 2008R2 with SP1.
I believe - from what I have read in various articles and advice I have been given - that I/O latency is the most important metric to be looking at. I've looked at this metric in three different places:
- LUN latency on the storage appliance
- Logical disk average sec/write and average sec/read on the Hyper-v host
- The same as above, but on the virtual machines themselves
This is in an effort to narrow down the source of any latency that might be happening. Sure enough, this is what I found....
What I'm seeing is what I would consider to be acceptable latency (3-15ms) on the LUNs, up to 20ms (still acceptable) on the Hyper-V host. When I look at the same metrics on a VM I'm seeing regular spikes of up to 300ms for up to 10 seconds at a time and an average of about 20-30ms.
This particular VM is a SQL server, but the same applies to non-SQL servers too. The relevant exceptions are added to our AV solution to avoid on-access scanning of DB files. Also, our VHDs are of a fixed size as opposed to dynamically expanding.
So for my question:
What are the likely causes of this latency, and/or what other metrics could I be using within the VM (or even on the Host) to narrow this down?
Measuring time within a VM can be problematic, as the virtual processors don't execute continuously. If you want to get a clear view of what's actually happening, use Performance Monitor in the management OS. Look for Hyper-V Virtual Storage Device. You can correlate that with data from Resource Monitor, too, to see what's contending for access to the disks.
In general, the response time of a particular VHD will have everything to do with what else is happening on the volume hosting that VHD.
Your 'disk latency' on the VM could be CPU latency on the host since the host has to use CPU cycles for IO requests.
Is the host heavily loaded overall? Or is it just running a lot of VMs? Not sure what the hyper-v equivalent, but the VMWare metric is CPU ready time - basically how often is the VM waiting on the host to run.