I have a number of Xen virtual machines running on a number of Linux servers. These VMs store their disk images in Linux LVM volumes with device names along the lines of /dev/xenVG/SERVER001OS and so on. I'd like to take regular backups of those disk images so I can restore the VMs in case we need to (the LVM devices are already mirrored with DRBD between two physical machines each, I'm just being extra paranoid here).
How do I go about this? Oviously the first step is to snapshot the LVM device, but how do I then transfer data to a backup server in the most efficient manner possible? I could simply copy the whole device, something along the lines of:
dd if=/dev/xenVG/SERVER001OS | ssh administrator@backupserver "dd of=/mnt/largeDisk/SERVER001OS.img"
...but that would take a lot of bandwidth. Is there an rsync-like tool for synching contents of whole disk blocks between remote servers? Something like:
rsync /dev/xenVG/SERVER001OS backupServer:/mnt/largeDisk/SERVER001OS.img
If I understand rsync's man page correctly, the above command won't actually work (will it?), but it shows what I'm aiming for. I understand the --devices rsync option is to copy devices themselves, not the contents of those devices. Making a local copy of the VM image before syncing it with the remote server isn't an option as there isn't the disk space.
Is there a handy utility that can synch between block devices and a backup file on a remote server? I can write one if I have to, but an existing solution would be better. Have I missed an rsync option that does this for me?
Although there are 'write-device' and 'copy-device' patches for RSync they only work well on small images (1-2GB). RSync will spend ages searching around for matching blocks on larger images and it's almost useless of 40GB or larger devices/files.
We use the following to perform a per 1MB checksum comparison and then simply copy the content if it doesn't match. We use this to backup servers on a virtual host in the USA to a backup system in the UK, over the public internet. Very little CPU activity and snapshot performance hit is only after hours:
Create snapshot:
Initial seeding:
Incremental nightly backup (only sends changed blocks):
Remove snapshot:
Standard rsync is missing this feature, but there is a patch for it in the rsync-patches tarball (copy-devices.diff) which can be downloaded from http://rsync.samba.org/ftp/rsync/ After appling and recompiling, you can rsync devices with the --copy-devices option.
People interested in doing this specifically with LVM snapshots might like my lvmsync tool, which reads the list of changed blocks in a snapshot and sends just those changes.
Take a look at Zumastor Linux Storage Project it implements "snapshot" backup using binary "rsync" via the ddsnap tool.
From the man-page:
ddsnap provides block device replication given a block level snapshot facility capable of holding multiple simultaneous snapshots efficiently. ddsnap can generate a list of snapshot chunks that differ between two snapshots, then send that difference over the wire. On a downstream server, write the updated data to a snapshotted block device.
There's a python script called blocksync which is a simple way to synchronize two block devices over a network via ssh, only transferring the changes.
I've recently hacked on it to clean it up and change it to use the same fast-checksum algorithm as rsync (Adler-32).
Just beware that the performance of a system that has LVM snapshots is proportional to the number of snapshots.
For example Mysql performance with lvm snapshots
If you're trying to minimize the amount empty space you'd send across the wire with a plain
dd
, could you not just pipe it to gzip before piping it to ssh?e.g. dd if=/dev/xenVG/SERVER001OS | gzip | ssh administrator@backupserver "dd of=/mnt/largeDisk/SERVER001OS.img.gz"
This is an old question, but nobody mentioned two very useful tools to efficiently synchronize two block devices:
bdsync, which use a diff-transfer-and-patch approach;
blocksync (here you can find my improved version), which use a in-place-rewrite approach.
I strongly suggest to play with both tools and to select whichever better adapt to your intended usage.
After searching for several years, I recently created a tool for synchronising LVM snapshots between servers. It is designed to use minimal IO and allow the systems to run while the synchronsation is happening.
It is similar to ZFS send / receive in that in synchronises the differences between LVM snapshots, and uses thin provisioning so that performance impact is minimal.
I would like feedback, so please have a look.
In addition to David Herselman's answer - the following script will sync to a local device:
As far as I know both scripts were first posted at lists.samba.org.