I needed to transfer a 20 GB KVM vdisk file, storing the root filesystem of a CentOS 6.5 VM, from one lab server to another. The large file size and the fact that I had once compressed such a vdisk file to a few hundred mega-bytes made me instinctively enable compression with scp
but I was surprised to see a rather low transfer speed. Then I tried bzip2
in combination with ssh
and cat
and was startled. Here is the summary of methods and average throughput.
scp -C vm1-root.img [email protected]:/mnt/vdisks/
, 11 MB/s.bzip2 -c vm1-root.img | ssh -l root 192.168.161.62 "bzip2 -d -c > /mnt/vdisks/vm1-root.img"
, 5 MB/s. This even lower result prompted searching on the Net.scp -c arcfour -C vm1-root.img [email protected]:/mnt/vdisks/
, 13 MB/s. This use of-c arcfour
as was suggested in one answer on serverfault. It hardly helped. Finally, I disabled compression.scp vm1-root.img [email protected]:/mnt/vdisks/
, 23 MB/s.
Shouldn't compression have been faster?
After receiving the ssh(1)
man page tip from @sven, I tried a couple alternative methods of file transfer not involving compression, both with better results.
cat vm1-root.img | ssh -l root 192.168.161.62 "cat > /mnt/vdisks/vm1-root.img"
, 26 MB/s.nc -l 5678 > /mnt/vdisks/vm1-root.img
on the receiver andnc 192.168.161.62 5678 < vm1-root.img
on the transmitter, 40 MB/s. The port5678
is an arbitrary one that was available.
Using nc
turned out to be the fastest copying method!
In the past, scp -C
has worked very well whenever I thought it would. For example, when transferring syslogs (/var/log/messages*
) of few GBs in size. An uncompressed transfer rate of few hundred KB/s would increase to 1-2 MB/s. This example does fall in the case of a slow connection as has been pointed out in the man page.
I have a case where, a newly created vdisk image for a 20 GB partition has a compressed size of just 200 MB. With a transfer rate of about 25 MB/s, we could do the copying in just 8 seconds instead of over 13 minutes! Clearly, scp
without compression is inefficient in this case and scp -C
is even worse.
I guess, the main lesson learned here is that, scp -C
should be thought of as only being a convenience. If a file can be significantly compressed, then it is better to first compress it on the source, transfer the compressed form and finally decompress on the destination. Tools that do the compression and decompression quickly (e.g. pbzip2) will be of greater help.
Quoting
man ssh
(which is the base used byscp
):The problem is that compressing the data takes more time then just sending it over the network.
Also, on top of compression, nc gets the best rate because it doesn't encrypt, either. And non-lossy compression relies on finding redundant sections of the data, which when done at the network level you can look at a maximum of [buffer-size] bytes where when done with the entire file first, it's [file-size] bytes within which to hunt and crunch duplicate byte sentences.
Also for moving disk images you should use a filesystem-aware tool like ntfsclone/partclone because even compression can't beat just plain skipping the unallocated blocks - your transfer rate is infinite if you don't have to transfer any data. Also don't forget to destroy the swap and hibernation files on a windows partition or you're copying junk it will just throw away and recreate anyway.