We've used tar
to backup and compress (gzip) selected directories on our file server with very good results until recently.
Each and every one of our backups are stored on mirrored (RAID) harddrives and simultaneously uploaded to a Amazon S3 bucket for off-site storage.
As our data has grown rapidly in size recently, so has also our backups. This week, our backup uploads have run 24/7 constantly just to sync the fresh backups from the last 7 days and still hasn't finished. Getting a better connection would solve some of this problem (which we can't do at the moment), but I think that it should be better to create a real solution instead of going for a workaround.
What alternative strategy, that keeps us away from multiple-digit gigabyte files and still lets us use tar
, could we use to backup our directories that would reduce the amount of bandwidth needed to sync the files?
Here's a commercial recommendation. Cactus Lone-Tar is a full backup suite that generates archive files that are extractable and listable using
tar
, even when written to tape. That's handy because you don't need the software to restore an archive. It's my go-to solution for standalone Linux server backup.Lone-Tar now has an online component that can integrate with a a bundled offsite storage package or a remote Linux server. Because this is a backup software suite, it maintains a proper catalog and can accommodate FULL, INCREMENTAL and SELECTIVE backups.
A lot of unknown variables here. What the size of your backups are, what your bandwidth limits are, do you want incremental or full backups, etc.
A few suggestions regardless:
Use rsync over ssh while using compression (-C option). Rsync would greatly reduce the amount of data needed for transfer on each backup. The compression would also reduce the amount of bandwidth required.
If bandwidth is limited, consider backing up to local disks. If you want offsite backups, you can always mail them offsite. As storage space explodes, you really shouldn't eliminate this as a valid option since bandwidth hasn't increased to match.
[edit] I noticed you the incremental tag. Does Amazon S3 bucket provide support for snapshots? That would take care of the incremental aspect.
Use rsync over ssh. If you want to keep historic versions you can set the -b and related options. If you are married to tar you could use the -z flag if you don't already to compress. You can go farther by taking advantage of the 'archive' bit on your filesystem using the dump command so that, like with typical rsync usage, only files that have changed since the last dump or sync will be copied over.