I'm attempting to copy a 75 gigabyte tgz (mysql lvm snapshot) from a Linux server in our LA data center to another Linux server in our NY data center over a 10MB link.
I am getting about 20-30Kb/s with rsync or scp which fluctates between 200-300 hours.
At the moment it is a relatively quiet link as the second data center is not yet active and I have gotten excellent speeds from small file transfers.
I've followed different tcp tuning guides I've found via google to no avail (maybe I'm reading the wrong guides, got a good one?).
I've seen the tar+netcat tunnel tip, but my understanding is that it is only good for LOTS of small files an doesn't update you when the file is effectively finished transferring.
Before I resort to shipping a hard drive, does anyone have any good input?
UPDATE: Well... it may be the link afterall :( See my tests below...
Transfers from NY to LA:
Getting a blank file.
[nathan@laobnas test]$ dd if=/dev/zero of=FROM_LA_TEST bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 29.412 seconds, 164 MB/s
[nathan@laobnas test]$ scp -C obnas:/obbkup/test/FROM_NY_TEST .
FROM_NY_TEST 3% 146MB 9.4MB/s 07:52 ETA
Getting the snapshot tarball.
[nathan@obnas db_backup]$ ls -la db_dump.08120922.tar.gz
-rw-r--r-- 1 root root 30428904033 Aug 12 22:42 db_dump.08120922.tar.gz
[nathan@laobnas test]$ scp -C obnas:/obbkup/db_backup/db_dump.08120922.tar.gz .
db_dump.08120922.tar.gz 0% 56MB 574.3KB/s 14:20:40 ET
Transfers from LA to NY:
Getting a blank file.
[nathan@obnas test]$ dd if=/dev/zero of=FROM_NY_TEST bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 29.2501 seconds, 165 MB/s
[nathan@obnas test]$ scp -C laobnas:/obbkup/test/FROM_LA_TEST .
FROM_LA_TEST 0% 6008KB 497.1KB/s 2:37:22 ETA
Gettting the snapshot tarball.
[nathan@laobnas db_backup]$ ls -la db_dump_08120901.tar.gz
-rw-r--r-- 1 root root 31090827509 Aug 12 21:21 db_dump_08120901.tar.gz
[nathan@obnas test]$ scp -C laobnas:/obbkup/db_backup/db_dump_08120901.tar.gz .
db_dump_08120901.tar.gz 0% 324KB 26.8KB/s 314:11:38 ETA
I guess I'll take it up with the folks who run our facilities the link is labeled as a MPLS/Ethernet 10MB link. (shrug)
Sneakernet Anyone?
Assuming this is a one time copy, I don't suppose its possible to just copy the file to a CD (or other media) and overnight it to the destination is there?
That might actually be your fastest option as a file transfer of that size, over that connection, might not copy correctly... in which case you get to start all over again.
rsync
My second choice/attempt would be rsync as it detects failed transfers, partial transfers, etc. and can pick up from where it left off.
The --progress flag will give you some feedback instead of just sitting there and leaving you to second guess yourself. :-)
Vuze (bittorrent)
Third choice would probably be to try and use Vuze as a torrent server and then have your remote location use a standard bitorrent client to download it. I know of others who have done this but you know... by the time they got it all set up running, etc... I could have overnighted the data...
Depends on your situation I guess.
Good luck!
UPDATE:
You know, I got thinking about your problem a little more. Why does the file have to be a single huge tarball? Tar is perfectly capable of splitting large files into smaller ones (to span media for example) so why not split that huge tarball into more managable pieces and then transfer the pieces over instead?
I've done that in the past, with a 60GB tbz2 file. I do not have the script anymore but it should be easy to rewrite it.
First, split your file into pieces of ~2GB :
For each piece, compute an MD5 hash (this is to check integrity) and store it somewhere, then start to copy the pieces and their md5 to the remote site with the tool of your choice (me : netcat-tar-pipe in a screen session).
After a while, check with the md5 if your pieces are okay, then :
If you have also done an MD5 of the original file, check it too. If it is okay, you can untar your file, everything should be ok.
(If I find the time, I'll rewrite the script)
Normally I'm a big advocate of rsync, but when transferring a single file for the first time, it doesn't seem to make much sense. If, however, you were re-transferring the file with only slight differences, rsync would be the clear winner. If you choose to use rsync anyway, I highly recommend running one end in
--daemon
mode to eliminate the performance-killing ssh tunnel. The man page describes this mode quite thoroughly.My recommendation? FTP or HTTP with servers and clients that support resuming interrupted downloads. Both protocols are fast and lightweight, avoiding the ssh-tunnel penalty. Apache + wget would be screaming fast.
The netcat pipe trick would also work fine. Tar is not necessary when transferring a single large file. And the reason it doesn't notify you when it's done is because you didn't tell it to. Add a
-q0
flag to the server side and it will behave exactly as you'd expect.The downside to the netcat approach is that it won't allow you to resume if your transfer dies 74GB in...
Give netcat (sometimes called nc) a shot. The following works on a directory, but it should be easy enough to tweak for just coping one file.
On the destination box:
On the source box:
You can try removing the 'z' option in both tar command for a bit more speed seeing as the file is already compressed.
Default SCP and Rsync (which uses SCP) are very slow for large files. I guess I would look into using a protocol with lower overhead. Have you tried using a simpler encryption cypher, or non at all? Try looking into the
--rsh
option for rsync to change the transfer method.Why not FTP or HTTP?
Although it adds a bit of overhead to the situation BitTorrent is actually a really nice solution to transferring large files. BitTorrent has a lot of nice features like natively chunking a file and checksumming each chunk which can be re-transmitted if corrupt.
A program like Azureus [now known as Vuze] contains all the pieces you will need to create, server & download torrents in one app. Bean in mind Azureus isn't the most lean of solutions available for BitTorrent and I think requires its GUI too - there are a lot of command line driven torrent tools for linux though.
Well, personally, 20-30Kb/s seems pretty low for a 10Mb (assuming 10Mb and not 10MB) link.
If I was you, I would do one of two things (assuming physical access is not available) -
Either one, I advise you to split the large file into smaller chunks, around 500MB Just incase of corruption in transit.
When you have the smaller chunks, use either rsync again, or I personally prefer to use a private Secure ftp session, and then CRC the files upon completion.
A few questions might help in the discussions: Just how critical is the data to be transfered? Is this for disaster recovery, hot backup, offline storage or what? Are you intending to backup the database while it is up or down? What about setting up a database at the remote system and keep them in sync using either clustering or updating via changelogs (I'm not totally versed on the capabilities of a MySql database system). This might help reduce the amount of data needing to be transferred through the link.
bbcp will chunk file for you and copy with multiple streams.
Late answer for googlers:
When transferring large datasets, rsync can be used to compare the source and destination, then write a batch file to local removable media using the --only-write-batch flag. You then ship the local media to the remote location, plug it in, and run rsync again, using --read-batch to incorporate the changes into the remote dataset.
If the source files change during physical transport, or if the transport media fills up, you can just keep repeating the --only-write-batch | ship | --read-batch cycle until the destination is all caught up.
(Ref: I was one of the authors of this feature in rsync -- for more background and use cases, see this discussion of the prototype implementation: https://lists.samba.org/archive/rsync/2005-March/011964.html)