I use tar
to create a snapshot of different parts of the filesystem on my servers and then ftp that snapshot to an offsite location for archiving.
I would like to only start that operation when something has changed. Some of the backups run on all the system's folders that change very infrequently (i. e. when new software is installed or the configurations are modified).
Whenever a change does happen, I want a complete snapshot. I could produce a list of modified files with find
, but I really only need to know if that list's length is 0 or greater. Using find is too slow for that.
I am aware that there is incremental backups and I'm already using rsync in conjunction with ZFS for that in other situations. However, here the backup host is an FTP server (so no rsync), I need complete backups (because the backup archive is used as an image to restore or clone servers) and I want compressed output (so tar is handy).
Edit: Note that I'm not looking for incremental backup (I have that), but rather for a fast (that kinda rules out find and the like) and easy way to decide if a full snapshot would be identical to the last one. Maybe my phrasing wasn't so good. I edited the title now.
GNU tar has a --newer-mtime option, which requires a date argument, which would presumably be the last time you did a backup. Depending on how much work you wanted to restore the filesystem, this could either be the last full backup, in which case you'd need to restore the full dump and the last daily, or you could do it since the last incremental, in which case, you'd need to restore the full dump and every dump after that.
This option does rely on the modification timestamp on the file, so if that has been explicitly changed, then there's a chance your backup will miss it.
The incron utility uses inotify to run commands when filesystem events occur. The configuration file is like a crontab, but instead of times you specify paths and events.
The command could either be your backup script (in which case backup will start almost immediately after the files were modified), or you could have it create some file, and have the backup script check for the existence of that file and then delete it. If the file exists, one of the events occurred since the last run.
You could always pipe find's output to wc and get an integer count of changed files:
Although David's answer requires fewer code changes :)
This is a little bit of a wild idea, but you could play a little with md5sum and ls.
This idea is to only look at a md5sum of one file, and that that file is a file listing of the dir you are watching. And as long as nothing changes, the md5sum is the same. But if a timestamp is updated the md5sum will change, and you know you need to do a new tar and send it to your ftp server.
We could start with something like this
Then you would need to add a comparison between the old md5 and the current... etc etc
/Johan
Recent versions of GNU find have the action "-quit", which causes find to immediately stop searching:
You could use a find-expression to find files that have changed, and use -quit to stop as soon as you find one. That should be faster than find continuing its scan.
-quit was added in fileutils V4.2.3
tar has a
--diff
option that will "find differences between archive and file system". If you keep a local copy of the file you uploaded then you could compare them with that.You also have the lower case -g option -g, --listed-incremental F create/list/extract new GNU-format incremental backup
I've never played with it but you could script something so test this first on non critical. ;) take a full backup
then
I switched my backup completely to rsnapshot (perlscript, it uses rsync and hardlinks, it can backup remote hosts)
Every night rsync copies just newer files and - thanks to hardlinks - every backupfolder represents the complete data.
rsnapshot is super fast and restore is so easy - give it a try!
Radical idea: you can have the system audit the files in question for each access.
This is very verbose in logging terms but would provide you with datestamps on each read/write. Yes, it is similar in concept to Windows NT audit logging. It's probably overkill for your setup, but in the interest of completeness, I'm offing this concept...
You can set up auditing using this short tutorial here.
Pros:
Cons:
You can use the
ausearch
tool to locate changes to files on a per-filename basis. A simple script to iterate over the directories (and subdirectories?) on a per-file basis would allow you to emit changes to a simple file, giving you a list of files that were "touched" in the criteria you specify. You can easily extend this with other filtering options inausearch
for per-user (useful if you have a user account for a service), per-command, etc.You could install git and parse the output of "git status" (or maybe the exit codes?) for the directories in question. Git is pretty fast at what it does.
Just make sure to commit the changes, so successive calls to "git status" will show changes.
Another idea would be to use tripwire or some similar tool.
A more brute force approach would be to periodically tar the directories anyway and compare an MD5 of the previous tar. If the directories are large, though, this would not scale too well.