I'm trying to transfer about 100k files totaling 90gb. Right now I'm using rsync daemon but its slow 3.4mb/s and I need to do this a number of times. I'm wondering what options do I have that would max out a 100mbit connection over the internet and be very reliable.
Have you considered Sneakernet? With large data sets overnight shipping is often going to be faster and cheaper than transferring via the Internet.
How? Or TL;DR
The fastest method I've found is a combination of
tar
,mbuffer
andssh
.E.g.:
Using this I've achieved sustained local network transfers over 950 Mb/s on 1Gb links. Replace the paths in each tar command to be appropriate for what you're transferring.
Why? mbuffer!
The biggest bottleneck in transferring large files over a network is, by far, disk I/O. The answer to that is
mbuffer
orbuffer
. They are largely similar butmbuffer
has some advantages. The default buffer size is 2MB formbuffer
and 1MB forbuffer
. Larger buffers are more likely to never be empty. Choosing a block size which is the lowest common multiple of the native block size on both the target and destination filesystem will give the best performance.Buffering is the thing that makes all the difference! Use it if you have it! If you don't have it, get it! Using
(m}?buffer
plus anything is better than anything by itself. it is almost literally a panacea for slow network file transfers.If you're transferring multiple files use
tar
to "lump" them together into a single data stream. If it's a single file you can usecat
or I/O redirection. The overhead oftar
vs.cat
is statistically insignificant so I always usetar
(orzfs -send
where I can) unless it's already a tarball. Neither of these is guaranteed to give you metadata (and in particularcat
will not). If you want metadata, I'll leave that as an exercise for you.Finally, using
ssh
for a transport mechanism is both secure and carries very little overhead. Again, the overhead ofssh
vs.nc
is statistically insignificant.You mention "rsync," so I assume you are using Linux:
Why don't you create a tar or tar.gz file? Network transfer time of one big file is faster than many small ones. You could even compress it if you wish...
Tar with no compression:
On the source server:
Then on the receiving end:
Tar with compression:
On the source server:
Then on the receiving end:
You would simply use rsync to do the actual transfer of the (tar|tar.gz) files.
You could try the
tar
andssh
trick described here:this should be rewritable to the following:
You'd lose the
--partial
features ofrsync
in the process, though. If the files don't change very frequently, living with a slow initialrsync
could be highly worth-while as it will go much faster in the future.You can use various compression options of rsync.
compression ratio for binary files is very low, so you can skip those files using --skip-compress e.g. iso, already archived and compressed tarballs etc.
I'm a big fan of SFTP. I use SFTP to transfer media from my main computer to my server. I get good speeds, over LAN.
SFTP is reliable, I'd give that a shot, as it's easy to set up, and it could be faster in some cases.