We currently run a bunch of physical machines. For backups, we've been using dirvish, which is essentially a wrapper around rsync, and doing them incrementally. We're currently pushing a new machine into production, which is going to run a whole bunch of VMs. Ideally, I'd like to back up the virtual machine images, rather than doing file level backups from the VMs themselves. Is there a way to do this incrementally, given that each image will be one giant file, which will require a new backup anytime anything changes? How do other people do VM backups here, just treat them as physical machines?
If it matters, we're using Xen to actually do the virtualisation.
Thanks
you could continue doing what you're currently doing with a few minor changes to your rsync backup scripts.
rsync can run inside a VM and backup to a remote host via ssh, just as it can from a physical machine. e.g. i backup /etc, /usr/local/, /home, parts of /var and a few other directories from all my machines to /var/backups/hosts/$HOSTNAME on my backup server (which, in turn, gets backed up with rsync to another machine and also to tape). database servers also run scripts which dump their dbs to text before the rsync.
to restore, just create a new VM (it's handy to have a few minimal-install images of various sizes that you can just clone), and rsync the backed-up files back in.
BTW, i usually don't bother backing up /bin, /sbin, /usr because i run debian on almost all machines. it would waste too much disk space and waste time to backup programs i've got packaged in my local debian mirror. instead i backup the list of installed packages with dpkg --get-selections "*" > $hostname.sel and restore them with cat $hostname.sel | dpkg --set-selections ; apt-get dselect-upgrade.
this is how i currently clone physical machines...i'm in the process of converting several machines to virtual (running under KVM) and so far haven't found any reason why i'd have to make more than minor changes to the procedure for that.
one of these days, i'll change to using rdiff-backup rather than rsync so i can have versioned backups online as well.
finally, you could also try searching the http://libvirt.org/ web site or googling for "+libvirt +rsync". someone may have come up with an efficient method of rsyncing VM images directly.
I'm not sure if you can do incremental backups easily, since the state of the machine would be changing so you'd risk capturing data in a transient (and thus corrupted) state. The only way I do it is to shut down the VM to do the copy.
I think you would need special software that uses snapshots in some form to get the copy.
Alternatively you can look at using DRDB to copy a filesystem in realtime between two systems, so the filesystem hosting the VM image would be copied automatically to a failover system.
I can't help with Xen specific VM images, but I backup my VMware images using rsync. You must suspend or shutdown the image before copying. Using a LVM snapshot can work instead, but I find that you need to have a huge snapshot partition to hold the changes.
I use the following rsync command to backup to a cifs-mounted remote server:
The incremental copies weren't worth the CPU required to calculate the diffs, but I do find that some of the VM images do not change - only most of them :(