I seem to have a strange issue with my VMware cluster, where I get inconsistent SCP transfer rates. I have Cluster1 and Cluster2, physically in different regions. I need to transfer large vmdk from Cluster1 to Cluster2. Here is what I get:
- SCP from VMware host, directly to VMware host, no compression: 4MB/s
- SCP from VMware host, directly to VMware host, with compression: 0.5MB/s
- SCP from VMware host in Cluster1, to one of the virtual machines running in Cluster1: 30MB/s
- SCP form virtual machine in Cluster1 to VMware host in Cluster2, with compression: 15MB/s
- SCP form virtual machine in Cluster1 to VMware host in Cluster2, no compression: 5MB/s
- SCP form virtual machine in Cluster1 to virtual machine in Cluster2, with compression: 20MB/s
- SCP form virtual machine in Cluster1 to virtual machine in Cluster2, no compression: 7MB/s
Testing network bandwidth with iperf shows that I have 200-300Mbps between the locations consistently. Network connection in both clusters, as well as internet is gigabit.
What would cause VMware to limit CPU usage for compression, and what would limit it's transfer rates outside of the cluster?
PS: inside the cluster I am going through public IP, and guest VM is on a different host. So theoretically VMware shouldn't know that the transfer is somewhat local.
EDIT: Cluster1 is 4.1, Cluster2 is 5.0; Tried FastSCP, and getting the same result as I see with direct SCP with compression: about 0.5MB/s.
EDIT 2: Increased system resource allocation on VMware hosts to the levels that VMs are getting and beyond. Only change is SCP with compressing from host to host is not up from 0.5M to 4M, just like non-compressed transfer.
Still puzzling why would host-to-host transfer be slower though.
EDIT 3: after adding more resources, was able to achieve 4~10MB/s transfer speeds across data centers. Even though it is lower than 20~40 that the network should be capable of, I'll just have to live with that. Although if anyone has any other ideas - I'm eager to try them :)
This is why people use third-party products like FastSCP.
I'm assuming you're on ESXi version 4 (you did not specify). This was improved in ESXi version 5. The reason the copies were intentionally throttled was tied to console resource management.
I have found to get around the speed limitation is to ssh into the VMware host and wget the file. it also makes more sense if you are using an iso or ova off the web.
There isn't any built in limitation from part of VMWare. The reason behind slow SCP from/to ESXi over a WAN is a small receiving buffer. This limitation is also present in other SCP/SSH implementations.
As SCP/SSH works over TCP, every time the remote buffer fills, it sends an acknowledgement packet to the sending end. The sending end will not send any new data until the acknowledgement is received. As latency increases the very nature of the TCP protocol combined with such small buffer will result in effective bandwidth being drastically affected.
We have covered this subject in more detail, along with some possible workarounds in this post.
VMWare ESXi SCP/SSH Throughput Limitations