When making a backup with rsnapshot, "[...] we first replicate the previous backup into a parallel directory structure, creating all the directories and making hard links to all the files.". This is all good.
I assume this implies that the initial backup will be retained forever? The "newer" backups will only point (through hardlinks) to the older backup, so I'm assuming that the actual file which any given hardlink points to needs to be retained forever to not break things?
Is this assumption right?
No, this is not correct. If you have multiple hardlinks to a file, it doesn't matter which one originally created the file, the file will only be deleted if the last link to the file is deleted (see the difference between a hard link as used by rsnapshot and a symbolic link) In the case of rsnapshot this means that every backup directory is self-contained and you can delete all other backup directories (including the initial one) and still have a full set of data.
Depending on how you configure rsnapshot, it will eventually delete the orginal backup set.
TL;DR: no.
It depends on what you define an "initial backup".
You first create a backup (
hourly.0
), which has all the files from today.On the next iteration, it "copies" the files (
cp -L
, just copies the links to the data), tohourly.1
folder.If all the files are the same as before, rsync won't write anything, so you have one block of data for a file (let's use
myfile.jpg
), and two links (hourly.0/myfile.jpg
andhourly.1/myfile.jpg
) pointing to the same file on the drive.On the next iteration with no changes, you still have the same data, just another pointer (
hourly.2/myfile.jpg
) pointing to that data. If you have set it up to keep 3 backups, it will then deletehourly.2
, movdehourly.1
tohourly.2
, movehourly.0
tohourly.1
, "copy" (create hardlinks) fromhourly.1
to createhourly.0
, and then run rsync again.If the file changes, rsync will "remove" the file (just the link actually)
hourly.0/myfile.jpg
(the data stays on the drive, since there are still two links pointing to it). Rsync will then create a new file (link+data) with the newmyfile.jpg
.So now you have one block of data with one link for the new file, one block with two links to it for the old version of the file.
On the next iteration, it deletes the
hourly.2
(one link less for the data of the old file), "copies" (hard links) the new file (new link for the new file. There are two pointers for the new file data, and one for the old versions data.On the next iteration, it deletes the last link for the old version (data with no links pointing to it, is considered free by the filesystem, and will get overwriten when needed), and three links towards the new file data.
If there is a link pointing to data (no matter from which directory), this data stays on the drive. Only once you delete all the links, then the data can get overwriten.