I'm looking for a way to take daily backups of an AWS bucket as incremental backups. These are to be stored offline and away from AWS.
For other storage systems (such as NAS drives) I use a daily rsync for backups. Using rsync's --link-dest
switch, I'm able to take a full snapshot every day of the remote file system. Any files which have not changed since the previous backup are hardlinked to the previous backup. This means that full daily snapshots only take the storage space of incremental backups.
I would like to setup something similar for an amazon S3 bucket. There's 20GB in the bucket but only ~50MB changes per day.
Note this is backup the content of an S3 bucket, NOT backup other content to S3 bucket.
I can see how I would use the AWS CLI tools to do full backups. I don't see how I can do incremental backups.
I guess I could (daily) synchronise S3 to a local hard drive, then daily backup the local hard drive. This feels very clunky.
Edit
This was intended as a simple technical question, not a general discussion of backup security. But since I'm being asked "why do you need this", I now see I need to explain basic principles of backups.
Anecdote: I recently witnessed a third party IT provider drop (entirely) an S3 bucket because of a miscommunication. This could have been very costly (~£100K of recent work, ~£1M total work). Luckily we happened to also have copies on our local laptops and for only £1K we rebuilt the content for them.
It has renewed my conviction that the only valid "backup" is on an isolated system stored offsite and offline, and with a media rotation that effectively implements a time lock. Other backups can enhance, providing more rapid recovery etc... but holding all your AWS backups on your own AWS account just isn't safe because ... user error.
Note: this is an answer to the original question before it even mentioned offline backups. Leaving it here as an answer for the original question: How to create an Incremental backup of an AWS S3 bucket.
The first question is Why do you want to back up S3 bucket? What's the issue you're trying to protect against?
Remember that S3 durability is somewhere around 99.99999% - you're extremely unlikely to lose objects due to HW failure, so we can rule that out.
If you want to make sure that accidentally overwritten objects in S3 can be recovered you can use S3 versioning - that will keep a history of all the older versions of the file and you can recover that way. Same for deletions.
Speaking of deletions - you can require use of MFA for S3 Deletions as another layer of protection, e.g. for compliance and auditing reasons. (thx Tim :)
If you need a second DR (disaster recovery) bucket in some other region for the unlikely event that your primary region goes off-line you can use S3 Cross Region Replication that will automatically mirror your bucket contents from one region to another with every change.
If none of the above still satisfies your needs you may want to have a Lambda function that handles each change in the S3 bucket for you. That way every time you write/update an object in S3 the Lambda will make a backup to your preferred destination. This can be used e.g. to mirror S3 buckets between different AWS Accounts, to other cloud providers or to offline destinations (e.g. to your on-prem server). With Lambda you've got the ultimate flexibility on what to do with the changes. See Using Lambda with Amazon S3.
If that's still not enough you can always use
aws s3 sync
that compares the source and destination buckets and copies only what has changed.(Update) For offline backups
aws s3 sync
as well - that can sync to/from disks, not only between buckets.That's plenty to choose from. Hope some of it suits your needs :)
There is a way besides
aws s3 sync
, but it might be just as clunky. You see, it comes down to adding a Lambda hook on the S3 bucket that triggers on PUTs. In theory, this would allow you to build an Add-only replica of the S3 bucket, so any DELETEs don't get replicated. There are tutorials for this, but in essence:The incremental backup logic would be written by you.