I hope you can help.. this is more of a sanity check just to verify my line of thinking at the minute. We've got a VMWare cluster setup that is networked to a high performance SAN. Everything performs nicely, running fio write tests I can get IOPS ~60k. So I was setting up a machine on this environment with a generous 16GB of RAM and 10vCPUs. All good so far..
I then attempted to rsync a file around 48GB in size from a remote source to this machine and as the transfer got up to a speed of about 20MB/s I quickly noticed the recipient VM slowing down and load increasing rapidly, to the point the machine became completely unstable and unusable. Trying to trace the route of this issue I logged into the host ESXi of this machine and ran esxtop. What I saw was hugely unexpected -
For the VM in question, there were 31 writes/s with a write latency of 1496.4ms!
However, looking at the actual disks themselves, they don't appear to be under much stress?
So, I guess the million dollar question, why do you think this might be happening, secondly any way I can further diagnose the issue and third, this is abnormal right?!
The latency can be caused by disk provisioning in Virtual Machine. Double check both networking and disk provisioning. I’d recommend Thick provisioned Eager Zeroed type of provisioning to avoid READ->MODIFY->WRITE operations (which is the default for both thin and lazy zeroed provisioning). Typical READ->WRITE can decrease the latency.
Can you provide more information about the networking? Also check the latency of the network.