Currently, at the datacenter, we have 6 boxes. Each of these runs a LAMP stack, and each of these needs to be backed up. The obvious solution is to back up all of this to one machine, and then plug in a drive to it, and back that up.
Problem is, some of our employees (read: the person who would be plugging in the drive) are lazy. So I was tasked with writing a script to back up each of our machines locally, so that they could then be backed up offsite in the same vein as above. However, the datacenter charges on a 95th percentile, and these backups are costing the company money.
My question, for small/medium sized businesses: what is the appropriate scenario that we should be using to back up our data? Currently, buying another machine to backup to is a far off non-possibility, but would be taken in to consideration.
There's a TON of backup questions already out there. I just finished answering one in fact. LOL
The O'Reilly book "Backup and Recovery" is a great book if you want to read about overall strategy and some of the possibilities out there.
You need a PLAN before you get started. And the plan needs to make sense and (hopefully) be scalable.
Some specific things you may want to look into and consider are:
Jungledisk
rdiff-backup
rsync (always a personal favorite)
Alright, your first step should be to evaluate what you really want, and create a backup plan before you create a backup solution. The dog gets tired of being wagged.
If I were in that situation, here is what I'd do.
If the machines have been created in kickstart, and are well managed, and you can recreate them from documentation, great. If not, do semi-annual images of them onto USB drives. You can use clonezilla and be done in a short time. This will give you relatively rapid turnaround if one of the machines catches on fire, or whatever.
You're going to want regular backups of the configuration and data as well. Without knowing how much data that is, I would say evaluate the needs of each individual server, and determine what your backup destination will require. Once you have that, you get a much better idea of where you can put it.
Evaluate. Plan. Build. Test. In that order, please.
Hmm. Well, our solution is to have a separate backup server in the datacenter, but without off-site backups. This is cheap for us because we rent a whole rack but only use 2/3 of it (and we're super-cozy with the datacenter provider - that space may even be free). I suggested off-site backups to my boss, but he nixed that as too expensive bandwidth-wise. He also thinks the likelihoood of the destruction of our datacenter is too small. I shrug my shoulders and say "it's your call".
As for the backup server itself, that too is done on the cheap. It's consumer grade hardware with a non-RAID drive. The expectation is that if the drive fails, I'd be there with a new one that day, and we'd rebuild the backups immediately. The software that backs up the servers however, does get sent off-site because that's very small compared to all the other data. It's also just a few custom shell scripts that use Rsync. I make incremental backups by making tarballs of the directories and rotating the tarballs.
Technically speaking, this is "enough". It's already saved our hides on a few occasions.
How are you currently running your backups offsite?
If you are happy with the current arrangement then you might just be able to limit the amount of bandwidth used by the process that performs the offsite copy, by wrapping it with trickle or something similar. Limit the bandwidth to something noticeably below your chargeable data rate.
I personally (and at work) use rsync for all out backups as the synchronisation protocol is very efficient at limiting the amount of data transferred. On top of that it has a transfer throttle option of its own, which we use so as not to saturate the outgoing ADSL line (which would be an inconvenience if a backup run kicked off while one of us were performing remote admin tasks down that line).
I personally like backuppc it's free, it pools data, very customizable, uses basic Linux tools, can backup all all kinds of OSs, web interface, and rather simple configs.
I use rsync to a central backup server; throw in a little "rotating backups" using hard links to conserve disk space:
Since rsync will move a new file over the top of an old one - the hard links will save a ton of disk space on files that never change (they all reference one inode), yet each copy of the directory should contain every file (not just symlinks to the files)
rsync has a bandwidth limiter, and compression built in:
I back up 23 machines to one server every night. Takes a little under 2 hours to cycle through all the machines - but I'm using a local machine.
There are some decently cheap 2TB drives out there right now - and you could easily set up a RAID 1 that mirrors your backup drive onto an extra drive that becomes your 'offsite' rotation drive.