We have a web app on a number of servers and we want to add an additional layer of redundancy by backing up the key data to S3. The key data is the MySQL database and a folder containing dynamically created site assets - predominantly images. Some kind of rsync based solution would initially seem the best plan. A couple of years ago we played with S3cmd (in particular s3cmd sync) with some success but we didn't find it particularly reliable although this may have changed since. Its occurred to me though that a rsync solution might not work particularly well with a single db.sql file created with mysqldump and I assume this means the whole database getting transferred each time, with multiple databases of over 1GB this is going to add up to a lot of traffic (and $s) very quickly. With the image files I could simply just transfer files modified within the last day which would be far more simple. What approach should I look at?
It looks to me as perfect job for opendedup. Give it a shot. Let us know if it solves your problem.
As you guessed
s3cmd
is much more reliable as it was few years ago and a lot of people are using it, including me without any problem. Also amazon S3 doesn't charge for uploading data in, so money factor is not a problem, but definitively you want to avoid unnecessary transfer which most of the time occur with database backups.I had the same problem with MySQL, because unfortunately doesn't support incremental backup. This is why I wrote an bash script which for every database dump a table in a different file. After that I compress it and
zdiff
with the previous copy, ignoring the last 2-3 lines (wheremysqldump
writes the current date). If there is no difference between the files I don't sync the content in the cloud. The downside of this approach is the complexity of the solution, which add extra steps when restoring the data.Also If you have any word in the development of the software you run on the server you can add an extra parameter for every table, which keeps tracks of the changes. So based on that you instruct your backup script to dump only the tables which were changed since the last backup.
Plain Rsync is a good choice for backing up files since it is simple and has good performance. However, if a file is modified while it is Rsync-ed, the copy might be corrupted. So it's critical to make sure that all files are closed. In your case, if there were a few image files were being changed and corrected, the next Rsync would overwrite them anyway since Rsync only copies over the modified files, thus it is like a self-healing process. So I think Rsync and save in S3 is a good choice here.