I'm trying to use rsync and "--link-dest=" to create incremental copies of backups on a server (Debian Wheezy, LVM, RAID 1), with the goal of using hard links to save space.
Unlike what may be the "normal" use case, I want to back up every day from a windows client to a folder on the server called "1" (this part works, though I don't use rsync here to do the backup), and then rsync off of "1" to create 30 days worth of incremental changes. So "1" changes with each day's backup from the client, but the copies made off of it would contain older file versions, 30 days worth.
From a post at http://blog.interlinked.org/tutorials/rsync_time_machine.html which outlines how to use rsync to simulate what Apple's Time Machine does, I have the following code (the "15/16" part of the target path represent the day/time of the backup):
date=`date "+%Y-%m-%dT%H:%M:%S"`
$UserNameVar=client8
rsync -aP --log-file=/home/User1/Desktop/rsync.log --link-dest=/home/$UserNameVar/share/Backups/1/current /home/$UserNameVar/share/Backups/1 /home/$UserNameVar/share/Backups/15/16/back-$date
rm -f /home/$UserNameVar/share/Backups/1/current
ln -s back-$date /home/$UserNameVar/share/Backups/1/current
The code runs, the backup occurs, the link between the last backup and "current" is created, and the subsequent backups are very fast, but as best I can tell, the backups consume the same space as the original.
Is the approach flawed, or something in my code wrong? Or do I need a different way to calculate the actual free space?
Thanks
There are a couple ways to detect if
--link-dest
is working like you expect.One way would be be to use the find command to look for files that have hardlink count greater then 1. Something like
find . -type f -links +1
.The
du
command will also typically only count a single file once, even if there are many hard links to it.So If you were to use du to get the usage from a folder above your two backups you should see one directory as consuming the majority of the storage.
If you are not seeing either of these indications then your files are not being linked. This can happen because rsync is not sensing these as identical files. For some reason the files or some attribute of them are different.
BTW, I am big fan of using dirvish instead of trying to roll your own script. It basically tool that runs rsync in link-dest mode.
Have you looked at rdiff-backup?
It creates rotating backups kept for a certain number of days, and uses rsync as the transport method. It basically does everything you are trying to script automatically with no extra effort. It creates diffs for each backup, so if nothing has changed, no extra disk space is used.
I use it extensively for server backups in combination with backupninja.
I've had good luck with Backup.rsync - it was even able to backup a host with a flakey network driver where tar was failing. It stores some duplicate files, and doesn't compress them, but it's fast.
It keeps an arbitrary number of backups, and will resume a previous interrupted backup nicely.
It's really just a wrapper around
rsync --link-dest
with some mv'ing.