I'm looking for a way to create an offsite backup of around 8TB of data. I've currently broken the data down into 2-4TB file systems and I'm using ShadowProtect to backup the data from an SBS 2003 server to a Windows 2003 backup server using a weekly full backup, and nightly incrementals.
I'm not very happy with this solution for a bunch of reasons.
- It takes too long to backup.
- To hold more than a weeks worth of backups requires tons of space.
- Offsite backups to external hdds would require too many disks, and too much time.
- Offsite backups over the internet would take too much bandwidth.
What I'm looking for if possible is someway to have a local backup server to house many snapshots without storing duplicate data like faubackup seems to be able to do. As well I would like to be able to span the backups that are made across a set of external disks, without duplicate data because the filesystems are bigger than I can fit on a single disk.
Correct me if I'm wrong but as far as I can tell it's a bit of an impossible situation to span the new data across more filesystems than faubackup itself uses due to it's use of hardlinks.
I've also been thinking about using openfiler some way to achieve the same goals, but haven't thought of a way yet.
How do other people cope with offsite backups of such large amounts of data.
edit:
For a bit more background information, we are a relatively small (about 15 employees) geology company where we basically take huge data sets, and make them readable. Projects often run into hundreds of Gigs. In spite of the importance of offsite backups, I will have trouble getting the money required to buy a tape autoloader that will do the kind of data we are looking at. I've tried and was basically told that there must be another way and I have to find it.
We have two servers. An SBS2003 server and a Windows 2003 R2 server which is used as a backup server. Both machines have a 20TB RAID6 array which houses our data. In any given day as well as regular stuff there will be minor modifications to many very large files.
This is exactly why most companies do backups to tape (lower-cost media than disks, fast streaming write speed), and then physically move the tapes off-site.
You can have the IT guy haul the tapes home, or there are data archival companies that will come to your business, pick up the tapes, and store them at their secure facility. Recovery is as simple as calling the company to bring the tape over, loading it up, and accessing your data.
The internet is good for a lot of things, but moving terabytes of data is not one of them. See Jeff's Article on The Economics of Bandwidth which references Jim Gray's excellent Microsoft Research whitepaper TeraScale SneakerNet (.DOC)
You are looking for a storage system that provide data deduplication: http://en.wikipedia.org/wiki/Data_deduplication
This won't relieve you of the requirement to get data off site somehow, but it will definitely help lower the amount of space required by your hot/live backups.
We have ~1TB of data, and backup everything nightly using custom rsync scripts. The nice thing about rsync is that it only copies modified bytes (not the entire modified file) ... plus it compresses the data before transferring.
In our old system, we had to cart tapes and disks home since every day about 200GB of files were modified. But with rsync only the 1GB or so of modified data within these files are transmitted, and compressed down to ~200MB. As a result, we are able to backup everything to a remote site over a T1 in a few minutes (and under an hour on a very heavy maintenance day). The scripts also utilize Linux hard links to maintain 30 days of full archives (not incrementals) using only 2-4TB (before compression) of space. So we end up being able to restore archived data in seconds, while also maintaining off-site storage.
Luckily disk drive space has kept up with our company growth ... I think our total solution at both locations cost ~$1000.
you might want to look into backuppc, it has to run on a linux box but it stores files using hard links so if the file hasnt changed since the last inc/full then it just hard links to it (so the amount of space to store 4x full backups is vastly smaller than other backup systems) It can backup windows machines via samba, and obviously also backs up linux/unix/macs
Backuppc
We have a replica SAN at another data centre that we snap and backup from.
Since your data is easily dividable into more manageable discrete units (project or job or whatever you choose to call it), why not just make a copy onto an inexpensive USB drive and store them somewhere? You can get 3TB drives for well under $200 US and smaller drives for considerably less.