I'm hosting a page and have ssh-access to the webspace.
The site allows modification by it's users. To be able to revert it back to an older state, I thought about rsync to create an incremental backup every 30 minutes using cron to launch the following script.
#!/bin/bash
# Binaries
RSYNC=`which rsync`
LN=`which ln`
MKDIR=`which mkdir`
#TODO: Is this enough to make the script distro independent?
# Other Variables
source="<username>@<provider>:<workspace path>"
target="<local backup path>"
# Date ...
year=$(date +%Y)
month=$(date +%m)
day=$(date +%d)
# ... and time
hour=$(date +%H)
minute=$(date +%M)
# Prepare directories
$MKDIR -p $target/$year/$month/$day/"$hour"_"$minute"/
# TODO: Why is this necessary? The actual backup won't work without this line
# saying "directory does not exist...".
# Actual backup
$RSYNC -av --delete "$source" "$target/$year/$month/$day/"$hour"_"$minute"/" --link-dest="$target/latest/"
$LN -nsf "$target/$year/$month/$day/"$hour"_"$minute"/" "$target/latest"
# End script
exit 0
The script seems to work so far but the target-path bloated to roughly three times the actual size of the source path within in the last three days.
Incremental backuping should only lead to a small increase, right?
What am I doing wrong?
Thanks in advance
Markus
If your backup media has a linux format eg ext3 or ext4 (and it probably should, or file attributes won't get backed up), then there is a neat trick you can do with rsync and cp -al making good use of a feature of the file system: you do an incremental backup, but then you create hard links to the files at each backup. This means you only copy the files that have changed but the backup media only has one copy of each file so doesn't balloon in size, (I can't take the credit for this; it was in a comment on a long-previous question that I could not begin to find again.)
My (daily) backup goes something like:
this updates "current" with only the files that have changed, but creates a directory named after today's date with hard links to all the files. Thus ls of each days backups appear to contain all the files in situ, but there is in fact only one copy on the backup media. The latter point is also the downside: as there's only one copy of each file you should rotate the media so you have multiple copies, but that is good backup practice anyway.
There's actually already a tool created that does exactly this, based on rsync. It's called rdiff-backup and I've used it many times in the past to create incremental backups and it supports rolling back to previous states. It can also be configured to clean up old backups so that your backup directory doesn't keep growing forever.
Find out more about it here and look at the usage examples on the documentation page: http://rdiff-backup.nongnu.org/
based on B.Tanner answer this is a script which tests every 60 seconds , if any file has changed , it will create a backup, you should have 2 folders backups/OLD and backups/current
The rsync program already has backup options that do what you want.
This is the script that I use for backups, running as root at 23:45 each day:
All changed and deleted files are preserved. It's easy to use the standard unix tools to examine and recover files:
Only the "mostrecent" directory is large.
The monthly accumulation directory (2018-12) contains the oldest changes throughout the month. It isn't necessary to do this step, but when I need to save space it allows me to delete all the daily updates for that month (A year from now I might care what things looked like at the end of December, but not so much how things changed within the month.)
Obviously you'd need to change the frequency, timestamps, etc., and add your portability code, but the same mechanism should do what you want.