I backup several virtual disks (total = around 4 Tb), with several weeks of retention time.
I use 4 x 4 Tb disks in the computer dedicated to primary backup. The filesystem is ZFS RAIDZ2, so 8 Tb usable.
A secondary backup of 4 x 2 Tb disks (4 Tb usable) is on a separate building, storing last sunday's backup.
I manage the retention by doing snapshots: after each backup a snapshot is created on the primary backup filesystem. And the snapshots older than 90 days are deleted. The modified data amount is less than 4 Tb for 90 days, so everything is okay (in fact I have 30 last days + 9 previous weeks + 10 previous months, but this is not the point).
On the secondary backup I have only one backup. I plan to implement retention too.
I first thought to upgrade to 4 x 4 Tb disks (because of lack of space, I can't upgrade to 6 x 2 Tb) and do snapshots as in the primary backup.
Instead of upgrading hardware, what if I use ZFS compression + snapshots on the secondary backup?
Compression will lead to, say, 600 Gb free. Then snapshots will give retention of several days.
The saved virtual disks are updated with rsync, so only small parts are modified. So I think only small parts are "transmitted" to snapshots. But I don't find any source confirming this will work as I think.
Question: using ZFS on Linux with compression, will very big files with scattered modifications be snapshoted efficiently?
You should be using ZFS compression (with
compression=lz4
) by default these days. There's no good reason not to use it, except if you know that your data is not compressible.Snapshots on compressed ZFS filesystems are still efficient and work with replication and/or rsync.
We use ZFS with compression and snapshots for big files backup too, since several years.
The size of snapshots is consistant with the data updated by rsync. So I don't know how compression works in ZFS, but it does not degrade significantly the efficiency of snapshots.