How viable as a backup strategy would be periodical LVM snapshots of xen domU's? Pros, cons, any gotchas?
To me it seems like the perfect solution for a fast, brainless restore. Any investigation could take place on the broken logical volume with domU successfuly running without interruption.
EDIT:
Here's where I'm at now, when doing full system backups.
- lvm snapshot of domU disk
- a new logical volume which size equals the snapshot size.
- dd if=/dev/snapshot of=/dev/new_lv
- disposing of snapshot with lvremove
- optional verification with kpartx/mount/ls
Now I need to automate this.
LVM snapshots are meant to capture the filesystem in a frozen state. They are not meant to be a backup in and of themselves. They are, however, useful for obtaining backup images that are consistent because the frozen image cannot and will not change during the backup process. So while you won't use them directly to make long-term backups, they will be of great value in any backup process that you decide to use.
There are a few steps to implement a snapshot. The first is that a new logical volume has to be allocated. The purpose of this volume is to provide an area where deltas (changes) to the filesystem are recorded. This allows the original volume to continue on without disrupting any existing read/write access. The downside to this is that the snapshot area is of a finite size, which means on a system with busy writes, it can fill up rather quickly. For volumes that have significant write activity, you will want to increase the size of your snapshot to allow enough space for all changes to be recorded. If your snapshot overflows (fills up) both the snapshot will halt and be marked as unusable. Should this happen, you will want to release your snapshot so you can get the original volume back online. Once the release is complete, you'll be able to remount the volume as read/write and make the filesystem on it available.
The second thing that happens is that LVM now "swaps" the true purposes of the volumes in question. You would think that the newly allocated snapshot would be the place to look for any changes to the filesystem, after all, it's where all the writes are going to, right? No, it's the other way around. Filesystems are mounted to LVM volume names, so swapping out the name from underneath the rest of the system would be a no-no (because the snapshot uses a different name). So the solution here is simple: When you access the original volume name, it will continue to refer to the live (read/write) version of the volume you did the snapshot of. The snapshot volume you create will refer to the frozen (read-only) version of the volume you intend to back up. A little confusing at first, but it will make sense.
All of this happens in less than 2 seconds. The rest of the system doesn't even notice. Unless, of course, you don't release the snapshot before it overflows...
At some point you will want to release your snapshot to reclaim the space it occupies. Once the release is complete, the snapshot volume is released back into the volume, and the original remains.
I do not recommend pursuing this as a long-term backup strategy. You are still hosting data on the same physical drive that can fail, and recovery of your filesystem from a drive that has failed is no backup at all.
So, in a nutshell:
LVM snapshots are great for being able to backup you server without taking it offline. As stated LVM snapshots are almost instant copies. You create them using the
lvcreate
command just as you would to create the LV itself, only you give it the--snapshot
option and the original LV instead of the VG. For instance:This will create a snapshot of the given LV with the specified snapshot name that you can then mount and use this snapshot LV to perform your backup from without worrying about files being actively used. This is particularly helpful if you are attempting to backup an active database server.
After you are done with backing up from the snapshot you would want to remove it to reduce any additional I/O overhead or other performance issues as others have mentioned using:
While LVM snapshots can be invaluable in producing a reliable backup of systems like databases and such that you would normally want to shutdown to backup to avoid file contention they are not ideal for long-term operation as a quick restore.
Not a good idea, IMO.
The snapshots are implemented in a copy-on-write fashion so you turn every write into a read and two writes (the block you are updating to is first read from the main volume and stored in the snapshot volume before you new data is place in its place) so you will see some performance degradation if a lot of writing is common on the VMs.
Also, IIRC, if the snapshot volume gets full it is simply dropped unceremoniously. This is not good for backup purposes! So if you do try this as a backup method, be sure to make the snapshot volume big enough to handle all the changes that will happen during the useful life of the snapshot. Of course if you are aware of and monitor the size issue and the performance issue is not a problem to you, then what you suggest might make a useful addition to other backup processes you have in place.
LVM snapshots are very useful as part of a backup process (taking a snapshot, backing up the snapshot to elsewhere to ensure the backup is consistent without having to disable updates to the "real" volume, drop the snapshot afterwards), amognst other things, but are not intended as a backup facility on their own.
You will need to ensure that the data on disk is in a consistent state before the snapshot is made. e.g. mysql may have data cached in memory that needs to forced to disk, either by dumping the database or shutting it down. See your applications manuals for details.
Beneath the smart looking stuff, LVMs is actually 'just' a device mapper trick. Creating a snapshot with lvcreate is not much more than a wrapper to some dmsetup stuff. The wrapper creates a new device (the snapshot volume) from one old volume (the original lv) and a new one (the copy-on-write volume). Together with that, the original LV is renamed to -real (see below, which is dmsetup ls --tree output). This -real LV is mapped to both the snapshot volume and the original volume, so it can be used in both places. The copy-on-write volume functions as an overlay to the -real LV. The -snap LV shows you the combination of the copy-on-write volume and the -real volume. This indeed creates some performance overhead.
When removing the snapshot, again some renaming and mapping happens. Afterwards, the situation will again look something like
As for in howfar this is a good method of backing up stuff: it can be, if you take into account this will (1) not help for the virtual machines RAM, (2) create a performance penalty and (3) you will need to store images of the snapshot elsewhere.
VMware VCB works with snapshots as well, btw, albeit not LVM ones.
Even if snapshots hadn't any performance impacts, you have to understand: Snapshots are no more of a backup than a copy to another folder on the same disk.
If the disk brakes, your data and your backup is lost. Even if you assign the snapshot area to another PE in the VG, it only contains the data modified since the snapshot.
Backing up means a copy at least to a completely separate drive as a minimum requirement.
i use such a setup for snapshots of vmware server machines and mysql databases. works fine so far. there was couple of restores - all without problems. one thing to consider - while running with snapshot lvm gets significant performance hit for i/o operations. look here. ignore the fact they talk about mysql, i/o ops are i/o ops... no matter what kind of data sits on lvm.
I use lvm snapshots only to copy the DomU Lv another one in a separate Vg, where each Domain has three backup "nodes" to is disposal.
After that, the snapshot is destroyed, and the backup Lv's remain until the next round. If I have a restore to make, I just have to choose a source Lv from the backup Vg and copy it to the domain Lv.
Once in a while, a backup Lv is dumped into an image file on a separate server.
All this is automated via script, with a backup every two days and a dump every week.
I even had a "panic" mode in mind, where the Domain Lv would be restored but run from a snapshot, and reset-ed every 2 hours, to keep de site online in case of serious hacks, until a proper defence could be organized.
What became of the 'panic mode' line of defense idea?