I'm trying to set up a space-efficient rotating backup scheme with rsnapshot / rsync from a server to a Hetzner storagebox. I'm having a hard time understanding how hard links on the destination are affecting the disk usage being reported. In short: even though hard links seem in place on the backup destination, they don't seem to be taken into account in disk usage, but instead counted as full files.
Since the destination folder of rsnapshot must be on the local file system, I have set up a workflow consisting of 2 parts:
- create a local snapshot with rsnapshot, in a local folder on the source server
- rsync that local snapshot over SSH with rsync to the destination
That seems to work well and fast, but I have one concern: disk usage reported on the destination (with du -sh
) seems to cumulate the size of all snapshots, even though they seem to have been copied properly using hard links. Note: since Hetzner storageboxes don't allow interactive SSH login, I'm inspecting this backup destination as a mounted volume with CIFS.
For example, after 3 rounds of this rsnaphsot + rsync combo, the destination folder contains daily.0
, daily.1
, and daily.2
folders. When checking random files in those snapshot folders for hard links, I do get expected results:
find /mnt/user.your-storagebox.de/rsync-backup/ -name "output.file" -print0 | xargs -0 ls -li
:351317 -rw-rw---- 3 root root 8650 Dec 15 11:25 /mnt/user.your-storagebox.de/rsync-backup/daily.0/home/user/output.file 351317 -rw-rw---- 3 root root 8650 Dec 15 11:25 /mnt/user.your-storagebox.de/rsync-backup/daily.1/home/user/output.file 351317 -rw-rw---- 3 root root 8650 Dec 15 11:25 /mnt/user.your-storagebox.de/rsync-backup/daily.2/home/user/output.file
returns 3 files with identical inodes, and a link count of 3 (as expected)
find /mnt/user.your-storagebox.de/rsync-backup/ -samefile /mnt/user.your-storagebox.de/rsync-backup/daily.0/home/user/output.file
/mnt/user.your-storagebox.de/rsync-backup/daily.0/var/tomcat/vhosts/output.file /mnt/user.your-storagebox.de/rsync-backup/daily.2/var/tomcat/vhosts/output.file /mnt/user.your-storagebox.de/rsync-backup/daily.1/var/tomcat/vhosts/output.file
returns 3 files (as expected)
I guess this indicates that these snapshots have been properly copied to the destination as hard links.
Yet... when checking their disk usage on the destination location: du -sh /mnt/user.your-storagebox.de/rsync-backup
, a value of 12G is returned. This is unexpected, since the original source folder only is about 4G. Apparently, disk usage is calculated cumulatively, despite the hard links?
OTOH, when inspecting the destination folder via rsnapshot du
, I'm getting output that does seem to take hard links into account:
4.3G /mnt/user.your-storagebox.de/rsync-backup/daily.0/
41K /mnt/user.your-storagebox.de/rsync-backup/daily.1/
41K /mnt/user.your-storagebox.de/rsync-backup/daily.2/
4.3G total
This is confusing: either the snapshots are being copied with hard links, and should take up minimal space (which seems to be the case when inspecting the inodes), or they are not and are taking up much more space than expected (as suggested by the du -sh
output).
My main concern is: is the disk usage reported on this mounted volume correct or nog? Are there any caveats w.r.t. the use of du -sh
on mounted volumes I should be aware of?
My version of
du
(Debian,du (GNU coreutils) 8.30
) handles files with hardlinks and counts the multiple instances only once. It would appear that yours does not. You can verify this fairly easily, thoughPrepare the scenario
Trial #1. Two files copied but not hardlinked
Trial #2. Two files hardlinked together
If your instance of
du
cannot handle multiple instances of the same hardlinked file, your trial #2 will return the total of bothetc.1
andetc.2
just like it did for trial #1.Using this information you can determine whether your version of
du
is being misleading, or that the files really are using up more disk space that you would expect. (Given your other metrics I'm fairly sure it's the former.)