I'm trying to figure out how LVM snapshots work so I can implement it on my fileserver but I'm having difficulty finding anything on google that explains how it works, instead of how to use it for a base backup system.
From what I've read I think it works something like this:
- You have an LVM with a primary partition and lots and lots of unallocated freespace not in the partition
- Then you take a snapshot and mount it on a new Logical Volume. Snapshots are supposed to have changes so this first snapshot would be a whole copy, correct?
- Then, the next day you take another snapshot (this one's partition size doesn't have to be so big) and mount it.
- Somehow the LVM keeps track of the snapshots, and doesn't store unchanged bits on the primary volume.
- Then you decide that you have enough snapshots and get rid of the first one. I have no idea how this works or how that would affect the next snapshot.
Can someone correct me where I'm wrong. At best, I'm guessing, I can't find anything on google.
vgdiplay
obu1:/home/jail/home/qps/backup/D# vgdisplay --- Volume group --- VG Name fileserverLVM System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 3 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 1 Act PV 1 VG Size 931.51 GB PE Size 4.00 MB Total PE 238467 Alloc PE / Size 238336 / 931.00 GB Free PE / Size 131 / 524.00 MB VG UUID qSGaG1-SQYO-D2bm-ohDf-d4eG-oGCY-4jOegU
Why not have a look at the snapshots section of the LVM-HOWTO?
LVM snapshots are your basic "copy on write" snapshot solution. The snapshot is really nothing more than asking the LVM to give you a "pointer" to the current state of the filesystem and to write changes made after the snapshot to a designated area.
LVM snapshots "live" inside the volume group hosting the volume subject to the snapshot-- not another volume. Your statement "...lots and lots of unallocated freespace not it the partition" makes it sound like your thinking is that the snapshots "live" outside the volume group subject to snapshot, and that's not accurate. Your volume group lives in a hard disk partition, and the volume being subject to snapshot and any shapshots you've taken live in that volume group.
The normal way that LVM snapshots are used is not for long-term storage, but rather to get a consistent "picture" of the filesystem such that a backup can be taken. Once the backup is done, the snapshot is discarded.
When you create an LVM snapshot you designate an amount of space to hold any changes made while the snapshot is active. If more changes are made than you've designated space for the snapshot becomes unusable and must be discarded. You don't want to leave snapshots laying around because (a) they'll fill up and become unusable, and (b) the system's performance is impacted while a snapshot is active-- things get slower.
Edit:
What Microsoft Volume Shadow Copy Services and LVM snapshots do aren't too tremendously different. Microsoft's solution is a bit more comprehensive (as is typically the case with Microsoft-- for better or for worse their tools and products often seek to solve pretty large problems versus focusing on one thing).
VSS is a more comprehensive solution that unifies support for hardware devices that support snapshots and software-based snapshots into a single API. Further, VSS has APIs to allow applications to be made quiescent through the snapshot APIs, whereas LVM snapshots are just concerned with snapshots-- any quiescing applications is your problem (putting databases into "backup" states, etc).
LVM snapshots are an example of a copy-on-write snapshot solution, as Evan said. How it works is a bit different from from Evan implied, but not by a whole lot.
When you have an LVM volume with no snapshots, writes to the volume happen as you'd expect. A block is changed, and that's it.
As soon as you create a snapshot, LVM creates a pool of blocks. This pool also contains a full copy of the LVM metadata of the volume. When writes happen to the main volume such as updating an inode, the block being overwritten is copied to this new pool and the new block is written to the main volume. This is the 'copy-on-write'. Because of this, the more data that gets changed between when a snapshot was taken and the current state of the main volume, the more space will get consumed by that snapshot pool.
When you mount the snapshot, the meta-data written when the snapshot was taken allows the mapping of snapshot pool blocks over changed blocks in the volume (or higher level snapshot). This way when an access comes for a specific block, LVM knows which block access. As far as the filesystem on that volume is concerned, there are no snapshots.
James pointed out one of the faults of this system. When you have multiple snapshots of the same volume, every time you write to a block in the main volume you potentially trigger writes in every single snapshot. This is because each snapshot maintains its own pool of changed blocks. Also, for long snapshot trees, accessing a snapshot can cause quite a bit of computation on the server to figure out which exact block needs to be served for an access.
When you dispose of a snapshot, LVM just drops the snapshot pool and updates the snapshot tree as needed. If the dropped snapshot is part of a snapshot tree, some blocks will be copied to lower level snapshot. If it is the lowest snapshot (or the only one), the pool just gets dropped and the operation is very fast.
Some file-systems do offer in-filesystem snapshots, ZFS and BTRFS are but two of the better known ones. They work similarly, though the filesystem itself manages the changed/unchanged mapping. This is arguably a better way of doing it since you can fsck an entire snapshot family for consistency, which is something you can't do with straight up LVM.
LVM snapshots are inefficient, the more snapshots there are the slower the system will go.
I only support xfs as its what we use and xfs_freeze can be used to halt new access to the file system and creates a stable image on disk.
Copy on Write is used so the disc space is used efficiently.
You have create a filesystem in a logical volume that has spare space in it for the snapshots.
This is an example from the FAQ
You don't specify whether you are using Linux or HP-UX. In HP-UX, you create a logical volume and mount it as a snapshot of another logical volume. In Linux, you create a logical volume as a snapshot volume.
Removing a snapshot in HP-UX is done by umounting the volume; in Linux it is done by using lvremove to remove the logical volume.
In any case, the changes are the only thing that is stored on your snapshot. The longer the snapshot remains available, the more changes it stocks up - and there is the chance it could fill up if not properly sized or released.
The speed of disk access on a snapshot volume is slower than it would be to a normal volume; you must take that into account.
@Evan Anderson and @sysadmin1138 answers, while very instructive and spot-on for their times (2009), are now somewhat outdated due to the existence of two distinct LVM snapshot methods:
the first one (let call it classical LVM) is the one described in the above answers. It basically set apart a specific disk portion where to copy to-be-overwritten data, meaning that multiple snapshots destroy performances (ie: if a single snapshot slowes down the system by 3-5x, two snapshots slow it by 6-10x, three snapshots by 12-15x, and so on). This, in turn, make them incapable of supporting a rolling-snapshot policy. Moreover, their metadata storage (plain text) was not optimized for speed. In fact, their main use was for backups: a single snapshot is taken and, after the backup, deleted;
the new one (called Thin LVM or lvmthin) is an entirely different beast. It heavily depend on binary optimized metadata (btree) to track space chunk quickly and efficiently. Taking a snapshot does not take any disk space (ie: the snapshot size should not be declared and it is not set apart), except for some more used metadata space. Overwriting an already-allocated chunk can again result in read-modify-write, but this can be entirely avoided for large writes (where "large" mean larger than the thin pool data chunks). More importantly, multiple snapshot does not copy any more data than a single snapshot, because only metadata are altered to point the various snapshots to the same data chunk. On the darker side, one should note that thin snapshot can "fill" the entire volume, causing all writes to stall.
Which LVM volume should you use? For root filesystem, I generally use classical LVM volumes: they are rock solid and are easier to recover. Moreover, a root partition often does not contain much valuable data by themself (and so normal backup procedure suffices). On the other hand for data volume I typically want some rolling-snapshot extending some days/weeks in the past, so I use Thin LVM (or a ZFS pool, but this is another story...). For some additional context, you can read here