I'm currently using the traditional rsync+cp -al method to create incremental/snapshot backups of our server tree. The backups are going onto a pair of eight-disk towers connected to the backup machine (a Sandy Bridge machine with 16 GB of RAM, running CentOS 5.5) via four eSATA connections (four disks per connection). Each disk is a regular 2 TB disk, so we have 32 TB of disk space connected to the backup machine. We're backing up about 20 TB of data on the servers with this.
The problem is that each daily backup is taking more than 24 hours, and the real time-killer isn't the actual rsync, but the time it takes to perform a cp -al of the tree locally on the backup machine. It's taking more than 12 hours just to make the shadow copy of the tree, and as far as I can tell the performance backlog is at the disk (top shows the cp using a lot of RAM but not a lot of CPU and mostly in uninterruptible-sleep state)
We have the server data split into four major volumes (and a few minor ones), and each of these backups runs in parallel (with some offsets in the cron to try to get some disks' cp done first). There are two volumes on the backup drive, both striped LVM volumes of 16 TB each.
So obviously I need to improve the performance because it's unusable as it stands.
The first question is: when CentOS 6 comes out, with support for btrfs, will making snapshots of subvolumes with btrfs substantially increase this performance?
The second is: is there a way, with ext3 or something else supported in CentOS 5 or 6, to 'encourage' it to put the directories/inodes in one part of a volume (which could happen to be the part that's on an SSD, via LVM) and the files in another? That would presumably solve the problem, but I don't know of ways to hint ext3 like that.
You could try using
--link-dest
and related options instead of followingrsync
withcp -al
. See the man page and tutorials like this one for details.You could get a little extra speed by turning off ext3's journal, though I wouldn't recommend that as it increases the chance of damaged backups if something goes wrong during an update.
If your backups include directories with many files, then you might find reformatting as ext4 or using the dir_index option with ext3 may improve things - but in both cases you might need to reformat to see the full benefit as just remounting the filesystem with the new options won't convert any existing structures.
Consider using rdiff-backup instead of rsync+cp. It handles old copies of files automatically, so you don't need to cp.
From rdiff-backup page:
"The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago."