I am migrating my server from the USA to the UK from one data center to another. My host said I should be able to achieve 11 megabytes per second.
The operating system is Windows Server 2008 at both ends.
My average file size is around 100 MB and the data is split across five 2 TB drives.
What would be the recommended way to transfer these files?
- FTP
- SMB
- Rsync / Robocopy
- Other?
I'm not too bothered about security as these are public files anyway, but I just want a solution that can push the full 11 MB/s transfer rate to minimize the total transfer time.
Ship hard drives across the ocean instead.
At 11 Mbps with full utilization, you're looking at just shy of 90 days to transfer 10 TB.
11 Mbps = 1.375 MBps = 116.015 GB/day.
10240 GB / 116.015 GB/day = ~88.3 days.
I'd say rsync, at 11 MB/s you will look at 10-14 days and even if you get interrupted, rsync will easily start where it stopped last time.
At 11 Mbps I'd ship the hard disks like suggested above :)
Rsync of course.
At least you can continue at any time after a break, and it's without any pain.
Never underestimate the bandwidth of a station wagon full of tapes
-- Trad.
In your case, disks or tapes sent by courier, but the principle still applies. If you're not concerned about latency, this will be vastly cheaper than the network bandwidth to transfer 10TB of data in any reasonable length of time.
You should use rsync. It will compress the data and de-duplicate it before sending. It can also resume partial transfers, which is very important for any large transfers.
It's likely it doesn't transfer 10 TB; if it's logs and text and such it could well be under 1 TB; perhaps way below 1 TB.
There are tools that do a better job of compression than rsync and likely find more matches. You could use
lrzip
, etc.There are specific types of data that doesn't compress well and doesn't contain literal dupes - videos and other media for example. In those cases, FTP and rsync are doing much the same effort.
I know this is already accepted but have you considered taking your disks to a data center/provider/host where you can get more bandwidth? It'll probably cost you some money but copying 10240Gb to backup disks and sending of will also cost both time and money (2 x money).
Also you'll be sure your disks don't break in transport.
11Mbps? This is quite a limitation you have here. In your situation I would simply:
If you really have no solution to increase bandwidth... Then shipping a physical drive will be way faster.
From my painful experience hard drives tend to break in the mail... USB flash drives are a way better solution for frequent data transfers. In your case it would require a few of them :) So send 2 copies of your data on multiple hard drives.
Considering the amount of data you have you could also send drives from a RAID 5 or RAID 6 array if you have the same hardware/software on the other side to plug your drives in. But in that case remember to mark the order of your drives and their serial numbers so when reconfiguring they don't get mixed-up.
While I have to agree on the "ship it using harddrives" answer in this case, here a copy solution I use when I have to copy large amounts of files for the first time:
While
rsync
is good to keep two data storages in sync, it introduces quite a bit of unnecessary overhead for the initial transfer. I figured that the fastest way is totar
which gets piped overnetcat
. On the receiver site you can also usenetcat
in listen mode which pipes the incoming data to an extractingtar
. The benefit is thattar
starts sending immediately andnetcat
sends it as plain TCP stream with no extra higher-level protocol overhead. This should be as fast as it gets. However, it is not simple possible to restart a interrupted transfer at the last position.It is also easily possible to compress the data for the transfer by using the right
tar
options or add a compression tool in the pipes. Note thatnetcat
sends the date unencrypted. In cases where this is not an option, an encryptedssh
connection can be used instead (tar <options> | ssh <target> -c 'tar -x <options>'
).If all data is transfered
rsync
can be used to ensure that all files which got updated in the meantime are synchronized. Also IIRCtar
doesn't create sockets which will get lost otherwise, but they aren't really used for datacenter data anyway.Again, first suggestion is to ship the drives.
Second suggestion is to use rsync to rsyncd, not over SSH. I've tried many things and it is usually the fastest. Remember to turn on compression. Also, look at increasing or decreasing the rsync buffer size to get the optimal transfer rate. It may also help to increase your MTU size. This only helps if routers en route don't fragment your packets though. There are ways to determine if they do.
Unfortunately there is no setting that's always the best. You'll have to experiment to find out what works best in your situation.
You mentioned the servers are running Windows 2008. Would Microsoft DFS be suitable? There is some magic in the lower end that tries to get as much bandwidth out of the connection as posible, and also has compression and de-duplication (IIRC).
Mind you, hard drives, DVDs or BluRays would be faster... My calculation is 11 days at the full 11 MB/s...