Is there a command, such as rsync, which can synchronise huge, sparse, files from one linux server to another?
It is very important that the destination file remains sparse. It may be longer (but not bigger) than the drive which contains it. Only changed blocks should be sent across the wire.
I have tried rsync, but got no joy. https://groups.google.com/forum/#!topic/mailing.unix.rsync/lPOScZgFE9M
If I write a programme to do this, am I just reinventing the wheel? http://www.finalcog.com/synchronise-block-devices
Thanks,
Chris.
To create new files in sparse mode
Followed by
To update all existing files (including the previously created sparse ones) inplace.
Rsync only transfers changes to each file and with --inplace should only rewrite the blocks that changed without recreating the file. From their features page.
Using --inplace should work for you. This will show you progress, compress the transfer (at the default compression level), transfer the contents of the local storage directory recursively (that first trailing slash matters), make the changes to the files in place and use ssh for the transport.
I often use the -a flag as well which does a few more things. It's equivalent to -rlptgoD I'll leave the exact behavior for you to look up in the man page.
To sync huge files or block-devices with low to moderate differences you can either do a plain copy or use bdsync, rsync is absolutely not fit for this particular case*.
bdsync
worked for me, seems mature enough, it's history of bugs is encouraging (little issues, prompt resolution). In my tests it's speed was close to the theoretical maximum you could get** (that is you can sync in about the time you need to read the file). Finally it's open source and costs nothing.bdsync
reads the files from both hosts and exchanges check-sums to compare them and detect differences. All these at the same time. It finally creates a compressed patch file on the source host. Then you move that file to the destination host and run bdsync a second time to patch the destination file.When using it over a rather fast link (e.g. 100Mbit Ethernet) and for files with small differences (as is most often the case on VM disks) it reduces the time to sync to the time you need to read the file. Over a slow link you need a bit more time because you have to copy the compressed changes from one host to the other (it seems you can save time using a nice trick but haven't tested). For files with many changes the time to write the patch file to disk should also be taken into account (and you need enough free space in both hosts to hold it).
Here's how I typically use bdsync. These commands are run on
$LOCAL_HOST
to "copy"$LOCAL_FILE
to$REMOTE_FILE
on$REMOTE_HOST
. I usepigz
(a fastergzip
) to compress the changes,ssh
to run bdsync on the remote host andrsync
/ssh
to copy the changes. Do note that I'm checking whether the patch has been applied successfully but I only print "Update successful" when it does. You may wish to do something more clevel in case of failure.*: rsync is hugely inefficient with huge files. Even with --inplace it will first read the whole file on the destination host, AFTERWARDS begin reading the file on the source host and finally transfer the differences (just run dstat or similar while running rsync and observe). The result is that even for files with small differences it takes about double the time you need to read the file in order to sync it.
**: Under the assumption that you have no other way to tell what parts of the files have changed. LVM snapshots use bitmaps to record the changed blocks so they can be extremely faster (The readme of lvmsync has more info).
Take a look at Zumastor Linux Storage Project it implements "snapshot" backup using binary "rsync" via the
ddsnap
tool.From the man-page:
ddsnap provides block device replication given a block level snapshot facility capable of holding multiple simultaneous snapshots efficiently. ddsnap can generate a list of snapshot chunks that differ between two snapshots, then send that difference over the wire. On a downstream server, write the updated data to a snapshotted block device.
I ended up writing software to do this:
http://www.virtsync.com
This is commercial software costing $49 per physical server.
I can now replicate a 50GB sparse file (which has 3GB of content) in under 3 minutes across residential broadband.
lvmsync does this.
Here's a usage transcript. It creates an LVM snapshot on the source, transfers the logical partition. You can transfer incremental updates of the changes since snapshot creation as often as you like.
Could replicating the whole file system be a solution? DRBD? http://www.drbd.org/
Maybe a bit strange here, but I found out recently that NFS handles this fine.
So you export a directory on one machine then mount it on the other and you just copy the files with basic utils like
cp
. (Some old/ancient utilities can have problem with sparse files.)I found
rsync
especially inefficient in transferring sparse files.I'm not aware of such a utility, only of the system calls that can handle it, so if you write such a utility, it might be rather helpful.
what you actually can do is use qemu-img convert to copy the files, but it will only work if the destination FS supports sparse files