The more I use rsync
the more I realise that it's a swiss army knife of file transfer. There are so many options. I recently found out that you can go --remove-source-files
and it'll delete a file from the source when it's been copied, which makes it a bit more of a move, rather than copy programme. :)
What are you favorite little rsync tips and tricks?
Try to use rsync version 3 if you have to sync many files! V3 builds its file list incrementally and is much faster and uses less memory than version 2.
Depending on your platform this can make quite a difference. On OSX version 2.6.3 would take more than one hour or crash trying to build an index of 5 million files while the version 3.0.2 I compiled started copying right away.
Using
--link-dest
to create space-efficient snapshot based backups, whereby you appear to have multiple complete copies of the backedup data (one for each backup run) but files that don't change between runs are hard-linked instead of creating new copies saving space.(actually, I still use the
rysnc
-followed-by-cp -al
method which achieves the same thing, see http://www.mikerubel.org/computers/rsync_snapshots/ for an oldish-but-still-very-good run down of both techniques and related issues)The one major disadvantage of this technique is that if a file is corrupted due to disk error it is just as corrupt in all snapshots that link to that file, but I have offline backups too which would protect against this to a decent extent. The other thing to look out for is that your filesystem has enough inodes or you'll run out of them before you actually run out of disk space (though I've never had a problem with the ext2/3 defaults).
Also, never forget the very very useful
--dry-run
for a little healthy paranoia, especially when you are using the--delete*
options.If you need to update a website with some huge files over a slowish link, you can transfer the small files this way:
rsync -a --max-size=100K /var/www/ there:/var/www/
then do this for the big files:
rsync -a --min-size=100K --bwlimit=100 /var/www/ there:/var/www/
rsync has lots of options that are handy for websites. Unfortunately, it does not have a built-in way of detecting simultaneous updates, so you have to add logic to cron scripts to avoid overlapping writes of huge files.
I use the --existing option when trying to keep a small subset of files from one directory synced to another location.
When this option is used rsync will stop after T minutes and exit. I think this option is useful when rsyncing a large amount of data during the night (non-busy hours), and then stopping when it is time for people to start using the network, during the day (busy hours).
This option allows you to specify at what time to stop rsync.
Batch mode can be used to apply the same set of updates to many identical systems.
--rsh
is mine.I've used it to change the cipher on ssh to something faster (
--rsh="ssh -c arcfour"
) also to set up a chain ofssh
s (recommend using it withssh-agent
) to sync files between hosts that can not talk directly. (rsync -av --rsh="ssh -TA userA@hostA ssh -TA -l userB" /tmp/foobar/ hostB:/tmp/foobar/
).If you are wondering how far along a slow-running rsync has gotten, and didn't use -v to list files as they are transferred, you can find out which files it has open:
on a system which has /proc
E.g. rsync was hung for me just now, even though the remote system seemed to have a bunch of space left. This trick helped me find the unexpectedly huge file which I didn't remember, which wouldn't fit on the other end.
It also told me a bit more interesting information - the other end apparently gave up, since there was also a broken socket link:
--archive
is a standard choice (though not the default) for backup-like jobs, which makes sure most metadata from the source files (permissions, ownership, etc.) are copied across.However, if you don't want to use that, oftentimes you'll still want to include
--times
, which will copy across the modification times of files. This makes the next rsync that runs (assuming you are doing it repeatedly) much faster, as rsync compares the modification times and skips the file if it's unchanged. Surprisingly (to me at least) this option is not the default.Mine is
--inplace
. Works wonders when the server for backups is running ZFS or btrfs and you make native snapshots.--backup-dir=
date +%Y.%m.%d
--delete We are deleting but making a copy... just in case