Ping a Specific Port

Question

Hanno Fietz

Asked: 2009-07-03 04:58:58 +0800 CST2009-07-03 04:58:58 +0800 CST 2009-07-03 04:58:58 +0800 CST

On Linux, what is a faster way than `find` or `diff -r` to see if something inside a directory has changed?

772

I use tar to create a snapshot of different parts of the filesystem on my servers and then ftp that snapshot to an offsite location for archiving.

I would like to only start that operation when something has changed. Some of the backups run on all the system's folders that change very infrequently (i. e. when new software is installed or the configurations are modified).

Whenever a change does happen, I want a complete snapshot. I could produce a list of modified files with find, but I really only need to know if that list's length is 0 or greater. Using find is too slow for that.

I am aware that there is incremental backups and I'm already using rsync in conjunction with ZFS for that in other situations. However, here the backup host is an FTP server (so no rsync), I need complete backups (because the backup archive is used as an image to restore or clone servers) and I want compressed output (so tar is handy).

Edit: Note that I'm not looking for incremental backup (I have that), but rather for a fast (that kinda rules out find and the like) and easy way to decide if a full snapshot would be identical to the last one. Maybe my phrasing wasn't so good. I edited the title now.

12 Answers

Voted

David Pashley · Answer 1 · 2009-07-03T05:34:11+08:00

David Pashley

2009-07-03T05:34:11+08:002009-07-03T05:34:11+08:00

GNU tar has a --newer-mtime option, which requires a date argument, which would presumably be the last time you did a backup. Depending on how much work you wanted to restore the filesystem, this could either be the last full backup, in which case you'd need to restore the full dump and the last daily, or you could do it since the last incremental, in which case, you'd need to restore the full dump and every dump after that.

This option does rely on the modification timestamp on the file, so if that has been explicitly changed, then there's a chance your backup will miss it.

4

Samuel Edwin Ward · Answer 2 · 2012-04-10T05:18:59+08:00

Samuel Edwin Ward

2012-04-10T05:18:59+08:002012-04-10T05:18:59+08:00

The incron utility uses inotify to run commands when filesystem events occur. The configuration file is like a crontab, but instead of times you specify paths and events.

The command could either be your backup script (in which case backup will start almost immediately after the files were modified), or you could have it create some file, and have the backup script check for the existence of that file and then delete it. If the file exists, one of the events occurred since the last run.

3

Andy · Answer 3 · 2009-07-03T05:36:14+08:00

Andy

2009-07-03T05:36:14+08:002009-07-03T05:36:14+08:00

You could always pipe find's output to wc and get an integer count of changed files:

find . -ctime 1 | wc -l

Although David's answer requires fewer code changes :)

2

Johan · Answer 4 · 2009-07-03T05:37:24+08:00

Johan

2009-07-03T05:37:24+08:002009-07-03T05:37:24+08:00

This is a little bit of a wild idea, but you could play a little with md5sum and ls.

This idea is to only look at a md5sum of one file, and that that file is a file listing of the dir you are watching. And as long as nothing changes, the md5sum is the same. But if a timestamp is updated the md5sum will change, and you know you need to do a new tar and send it to your ftp server.

We could start with something like this

ls -lR /path/to/dir/ | md5sum > file_list.txt.md5

Then you would need to add a comparison between the old md5 and the current... etc etc

/Johan

2

sleske · Answer 5 · 2009-07-07T09:40:02+08:00

Best Answer

sleske

2009-07-07T09:40:02+08:002009-07-07T09:40:02+08:00

Recent versions of GNU find have the action "-quit", which causes find to immediately stop searching:

— Action: -quit

Exit immediately (with return value zero if no errors have occurred). This is different to ‘-prune’ because ‘-prune’ only applies to the contents of pruned directories, whilt ‘-quit’ simply makes find stop immediately. No child processes will be left running, but no more files specified on the command line will be processed. For example, find /tmp/foo /tmp/bar -print -quit will print only ‘/tmp/foo’. Any command lines which have been built by ‘-exec ... +’ or ‘-execdir ... +’ are invoked before the program is exited.

You could use a find-expression to find files that have changed, and use -quit to stop as soon as you find one. That should be faster than find continuing its scan.

-quit was added in fileutils V4.2.3

2

pgs · Answer 6 · 2009-07-03T05:08:36+08:00

pgs

2009-07-03T05:08:36+08:002009-07-03T05:08:36+08:00

tar has a --diff option that will "find differences between archive and file system". If you keep a local copy of the file you uploaded then you could compare them with that.

1

egorgry · Answer 7 · 2009-07-03T05:47:16+08:00

egorgry

2009-07-03T05:47:16+08:002009-07-03T05:47:16+08:00

You also have the lower case -g option -g, --listed-incremental F create/list/extract new GNU-format incremental backup

I've never played with it but you could script something so test this first on non critical. ;) take a full backup

tar -zcvf /home/backup.tar.gz /backup_dir

then

  NOW=$(date +"%d-%m-%Y")
  i=$(date +"%Hh%Mm%Ss")
  FILE="i.$NOW-$i.tar.gz"

tar -g /home/backup.incremental.txt -zcvf /backup/$FILE /backup_dir

1

ThorstenS · Answer 8 · 2009-07-03T08:08:03+08:00

ThorstenS

2009-07-03T08:08:03+08:002009-07-03T08:08:03+08:00

I switched my backup completely to rsnapshot (perlscript, it uses rsync and hardlinks, it can backup remote hosts)

Every night rsync copies just newer files and - thanks to hardlinks - every backupfolder represents the complete data.

rsnapshot is super fast and restore is so easy - give it a try!

1

Avery Payne · Answer 9 · 2009-07-03T08:52:45+08:00

Radical idea: you can have the system audit the files in question for each access.

This is very verbose in logging terms but would provide you with datestamps on each read/write. Yes, it is similar in concept to Windows NT audit logging. It's probably overkill for your setup, but in the interest of completeness, I'm offing this concept...

You can set up auditing using this short tutorial here.

Pros:

catches just about everything, including a history of multiple edits and changes.
very fine grain control
can selectively audit by file

Cons:

increases log spew by a factor of 1 bazillion percent if not tamped down properly to the directories you're concerned about. Use it judiciously.
Does not audit data changes, only who did what and when

You can use the ausearch tool to locate changes to files on a per-filename basis. A simple script to iterate over the directories (and subdirectories?) on a per-file basis would allow you to emit changes to a simple file, giving you a list of files that were "touched" in the criteria you specify. You can easily extend this with other filtering options in ausearch for per-user (useful if you have a user account for a service), per-command, etc.

Geoff Fritz · Answer 10 · 2009-07-03T09:24:08+08:00

Geoff Fritz

2009-07-03T09:24:08+08:002009-07-03T09:24:08+08:00

You could install git and parse the output of "git status" (or maybe the exit codes?) for the directories in question. Git is pretty fast at what it does.

Just make sure to commit the changes, so successive calls to "git status" will show changes.

Another idea would be to use tripwire or some similar tool.

A more brute force approach would be to periodically tar the directories anyway and compare an MD5 of the previous tar. If the directories are large, though, this would not scale too well.

1

On Linux, what is a faster way than `find` or `diff -r` to see if something inside a directory has changed?

Radical idea: you can have the system audit the files in question for each access.

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?