I want to automate the backup of my databases and files with cron. Should I add the following lines to crontab ?
mysqldump -u root -pPASSWORD database_name | gzip > /home/backup/database_`date +\%m-\%d-\%Y`.sql.gz
svn commit -m "Committing the working copy containing the database dump"
First of all, is this a good approach?
It is not clear how to specify the repository and the working copy with svn?
How can I run svn only when the mysqldump is done and not before ? Avoiding conflicts
1) If you insist on storing backups in subversion, then there is nothing wrong with this approach. It is strange, though.
2) You should keep a checkout around, place the dump into the working directory, and run
svn update
andsvn add
as appropriate before committing.3) If you run the commands as shown from a shell script, there should be no overlap.
Also note that compressing the mysql output will create vastly different binaries and cause your repository disk requirements to balloon. It may take more initial space with the uncompressed sql but the text diffs will be stored much more efficiently in the repository. Also there is no need to store each one as a separate file with the date in the name. Might as well be the same file since version control will let you turn back the clock.
I much prefer rolling backups with something like the last 7 days of zipped up sql then snapshots one week apart going back a month or two. I don't see the need to version control the database for all eternity.
As I said saving the DB in a SVN repository is not a good practice.
Regarding the mysqldump, keep in mind that in this way you are also including these options (--opt is the default that is a shorthand of the below):
So, if you will use the full dump you created you will overwrite all the data you inserted after the last backup you performed.
Till your DB will be as small as you said, I advice you to do a backup more often.
This is a nice example on how you can proceed.
If your DB is large, storing hundreds of copies of a database will exceed your storage capacity, probably catching you unprepared and busy with something else. gzip compression will almost certainly hurt, since it impede's SVN's ability to compress between revisions, and I think SVN already uses a zip library internally. You might take take a few days worth of backups and try it both ways and see which one uses less disk. It'll probably also be helpful to order the dump in some way, like say with --order-by-primary; otherwise, SVN will have to waste disk representing frivolous reorderings of mysqldump, which you don't care about.
But eventually you have to simply discard data. One interesting approach I've seen went by the name of "logarithmic backups". The idea is that newer backups are more important than older backups, so you save more of them and expire most of them as they age. So you end up with
This is a similar approach to RRDtool, where data is consolidated into a representative object. You wind up with 20+ backups total, and the ability to recover short lived data from the recent past and long-lived data from the distant past.
Actually answering the question
Since your data is relatively small, and probably doesn't change much, SVN might not be a bad approach.
I have a similar process for putting websites of interest into SVN, which I've modified to suit your needs. Put this in your cron.daily or whereever, and it'll get you done. You will need to initialize the repo first, and tweak it to suit your needs, but this is a good start:
Hows about doing an automated dump of the backup with automysqlbackup and then creating auto-managed full/diffs with BackupPC?
BackupPC hard-links between non-changed files within different backups to minimise disk space usage.
http://sourceforge.net/projects/automysqlbackup/
http://backuppc.sourceforge.net/
Dumping your database to a file and the committing that file to svn is a fine approach. There are a couple of things I recommend changing though... You should overwrite the same file every time you dump (remove the date from the filename) and remove the zip.
Just like editing your source code, you are essentially modifying the database dump file. Also, as someone else mentioned, if you don't zip it, in the long run it will compress much better on the server because the diffs will be much smaller.
The great part about this approach is that it ties together the database with the code at a particular time. For example, you need to run revision 1428. No problem... update your local repo to rev 1428 and you have all of the code and the database dump that runs with 1428. Load that dump, compile your code, and you're in business.