With btrfs hitting production in Oracle EL 14th this month (together with working fsck and scrubbing from Linux 3.2) I was thinking of redesigning my current backup solution to utilise it. Note that I'm thinking about doing it for small amounts of data, less than 10TB, that's fairly static (less than 1% changed daily). In short a SMB/SOHO backup solution.
What the backup should do:
- do a LVM snapshot of ext[234]/XFS/JFS on the production server
rsync
/transfer changed data to btrfs on backup server- snapshot the btrfs filesystem
- drop old snapshots when free space is running low
Pros:
- All files easily available, no decompression or loop mounting needed
- Past snapshots also easily available...
- ... so I can share them as read-only Samba shares (with shadow copy support)
- Snapshots take minimal amount of space thanks to copy-on-write (snapshot without changes takes literally few KiB on disk)
- High backup consistency: checksums on files, scrubbing of all data and built-in redundancy
Questions:
- Is there some backup solution (in form of Bacula, BackupPC, etc.) that is, or can be easily made, aware of copy-on-write file system?
- Or will I need to use in-home
rsync
solution? - What do people with ZFS boxes dedicated for backup do to backup their Linux machines?
I've done some extensive searching in the last week for something similar. I have found no solutions to do all 4 steps. There are numerous blogs from home users who try the 'rsync to btrfs'-type of backups, and all of the major Btrfs wikis cover how to perform Btrfs snapshots.
There are also quite a few people who are attempting different ways of rotating Btrfs snapshots. However, you are the first person I've seen who wants to rotate snapshots based on disk space. I am playing with btrfs-snap myself which creates a set of hourly, weekly and monthly snapshots, and it's nice and simple.
The Dirvish project seems to meet many of your requirements. Some developers are attempting to integrate Dirvish with Btrfs. However, the Dirvish project seems a bit stalled.
At this point in time, you are ahead of the curve.
According to Avi Miller (his talk during LinuxConf.AU) a btrfs send/receive is being worked on. It'll be faster than rsync since it doesn't need to traverse through directories to find changes in files.. I don't know if there's an expected release date yet though.
There is, however, a utility built into btrfs-progs that lists every file that has changed between snapshots/etc.. btrfs subvolume find-new
I am working on a OS backup system similar to BackupPC. I have thought about this. What has been stopping me from actually implementing that is that you cannot hardlink between subvolumes. You can also only create snapshots of subvolumes -> One subvolume per backup client. Thus the file level deduplication feature cannot coexist with this approach. And that file level deduplication usually saves a lot of space. Do you want to back up only one server?
If btrfs had block level deduplication this problem can be probably avoided, but that is usually unsufferably slow as well...
Then such an approach would of course entail a tight integration with one filesystem (btrfs), so this should be an optional feature.
I'm asking because I'm thinking about adding such a cow feature, but do not know if I should because of the drawbacks listed above.
Edit: UrBackup supports backups as descibed in the question now with Linux kernels >=3.6 (with cross volume reflink support). See how to set it up.
The btrfs wiki page "Use Cases" lists some tools: SnapBtr, Snapper, btrfs-time-machine, UrBackup.
There's a proposal for a built-in tool called autosnap:
However, as of October 2013, the wiki states that "The autosnap functionality is currently not included in upstream version of btrfs."
I had similar frustrations, so I ended up creating a few scripts which I'm calling snazzer. Together they offer snapshotting, pruning, measurement and transport via ssh (but as of today can send/receive to/from local filesystems as well). Measurements are just reports of sha512sum and PGP signatures of snapshot paths. It's not quite ready for release but I would love to hear feedback if anybody has time to review it at this early stage.
CLI-only at this point, but I've taken some time to make it easy to use on systems with many btrfs subvolumes - typically I have separate subvolumes for
/var/cache
,/home
, etc. which may need to be excluded from snapshotting or have more/less aggressive pruning schedules.I'm afraid the pruning algorithm purely makes decisions on the presence of the set of snapshots and their dates, nothing is there to keep pruning until a disk usage constraint is met - which do you delete first? Reduce the number of hourlies first, or daylies? Perhaps drop the oldest, Eg. yearlies? Different deployments will have different priorities; and I can't know if this is the only backup tier (in which case you shouldn't drop oldest backups in case of legal/insurance obligations), or just an intermediate one (in which case you probably have those yearlies archived somewhere safe elsewhere).
I'll be adding ZFS support and/or interoperability at some point; it's written mostly in posix-ish shell and perl due to a strong desire for "zero" dependencies at the moment, I'll hopefully have a cleaner python alternate implementation maintained in parallel at some point.