Which backup tool or solution would you use to backup terabytes and lots of files on a production linux server ?
Note that the files are all different and almost never modified, and usage is mostly adding files, so data volume is today 3TB growing all the time at around +15GB/day.
Please do not reply rsync. Basic unix tools are not enough, rsync does not keep history, rdiff-backup miserably fails from time to time and screw the history. Moreover these are all file based backup, which put a lot of IOwait just to browse directories and query stat(). But i guess, except R1Soft CDP, there is no way around that.
We tried R1Soft CDP backup, which is block level backup, and it proved good and efficient for all our other servers, but systematically fails on the server with 3 terabytes and gazillions of files. That is already more than 2 months that the engineers of R1Soft and datacenter are playing a hot ball game... and still no backup except regular rsync
We never tried big commercial solutions, except R1Soft CDP since it was provided as an optional service by the datacented hosting our servers.
I tried many backup solution, started with rsync and rdiff-backup. Also pure tar-ing and bash scripts. But bacula beats them all. It is based on modular design, I have about 8 PCs in backup network and growing.
To anyone I recommended bacula, they were more than happy to finally their home.
I think only solution for you is block-level backups
You may write scripts that uses LVM snapshots (or even lower level dm-snapshots) and transfer them to storage server
You also may take a look into Zumastor project and their ddsnap utility
PS. Solaris/FreeBSD servers have ZFS that can automate this process by using incremental snapshots + ZFS send/recive
You don't say what you want to back it up to; tape or disc? Assuming the former, then I endorse the recommendations for bacula. I use it at several different sites, at one of which I have it driving a 60-slot two-drive LTO2 robot, with a total of maybe 50TB of tape storage spread over 120 tapes, and the single largest server having about 4TB of disc. Bacula is very, very good when it's properly configured.
Disc backups I can't comment on usefully, as I'm firmly an old-style tape man myself. Since you specifically mention keeping history, I'd hope you were open to removable-media (ie, tape) backups.
Try BackupPC. For me it works very well with couple of terabytes of data and tens of millions of files (some 100 000 - 500 000 of those changing daily). OK, BackupPC does use rsync and is file based, so that might be a show-stopper for you.
Bacula is another popular one, and it sure has the coolest slogan of them all. And it even does not use rsync! :-)
EMC Networker has an option called SnapImage that should increase backup speed for your kind of data.
I have only heard about it, but I never tried, sorry...
rsnapshot
or, if you want more control; just hack up a short bash script to do the same thing: one
cp -al
, a fewmv
andrsync
.i use it on a very busy 30TB server with around 5million files, and works wonderfully.
Try using mirrordir. With an appropriate script, it seems to be the ideal solution for you. It only updates the files which have changed, (modified, created, or deleted,) but also has the capability to preserve old files. I'm not sure how that function works, but it shouldn't be hard. Here's the script I use: (Edited somewhat for clarity. Hope I didn't cause problems with the edits)
With no changes to commit (second run-through, for example) it takes about 5-7 minutes to scan 1.5 TB of files. Of course, it's a lot slower on the first run-through.
By the way, this script was written by me for my use on my personal server at home. While anyone is absolutely free to use or modify it for themselves, I am making absolutely no guarantees or warranties. It's free, so you get what you pay for. Hope it helps, though!