Ping a Specific Port

Question

aneuryzm

Asked: 2011-01-04 03:48:38 +0800 CST2011-01-04 03:48:38 +0800 CST 2011-01-04 03:48:38 +0800 CST

Automate the backup of my databases and files with cron

772

I want to automate the backup of my databases and files with cron. Should I add the following lines to crontab ?

mysqldump -u root -pPASSWORD database_name | gzip > /home/backup/database_`date +\%m-\%d-\%Y`.sql.gz

svn commit -m "Committing the working copy containing the database dump"

First of all, is this a good approach?
It is not clear how to specify the repository and the working copy with svn?
How can I run svn only when the mysqldump is done and not before ? Avoiding conflicts

6 Answers

Voted

Peter Eisentraut · Answer 1 · 2011-01-04T05:05:18+08:00

Best Answer

Peter Eisentraut

2011-01-04T05:05:18+08:002011-01-04T05:05:18+08:00

1) If you insist on storing backups in subversion, then there is nothing wrong with this approach. It is strange, though.

2) You should keep a checkout around, place the dump into the working directory, and run svn update and svn add as appropriate before committing.

3) If you run the commands as shown from a shell script, there should be no overlap.

6

vinnyjames · Answer 2 · 2011-01-04T09:33:22+08:00

vinnyjames

2011-01-04T09:33:22+08:002011-01-04T09:33:22+08:00

Also note that compressing the mysql output will create vastly different binaries and cause your repository disk requirements to balloon. It may take more initial space with the uncompressed sql but the text diffs will be stored much more efficiently in the repository. Also there is no need to store each one as a separate file with the date in the name. Might as well be the same file since version control will let you turn back the clock.

I much prefer rolling backups with something like the last 7 days of zipped up sql then snapshots one week apart going back a month or two. I don't see the need to version control the database for all eternity.

4

tmow · Answer 3 · 2011-01-04T05:37:16+08:00

tmow

2011-01-04T05:37:16+08:002011-01-04T05:37:16+08:00

As I said saving the DB in a SVN repository is not a good practice.

Regarding the mysqldump, keep in mind that in this way you are also including these options (--opt is the default that is a shorthand of the below):

--add-drop-table --add-locks --create-options --disable-keys --extended-insert --lock-tables --quick --set-charset

So, if you will use the full dump you created you will overwrite all the data you inserted after the last backup you performed.

Till your DB will be as small as you said, I advice you to do a backup more often.

This is a nice example on how you can proceed.

3

jldugger · Answer 4 · 2011-01-04T09:46:10+08:00

If your DB is large, storing hundreds of copies of a database will exceed your storage capacity, probably catching you unprepared and busy with something else. gzip compression will almost certainly hurt, since it impede's SVN's ability to compress between revisions, and I think SVN already uses a zip library internally. You might take take a few days worth of backups and try it both ways and see which one uses less disk. It'll probably also be helpful to order the dump in some way, like say with --order-by-primary; otherwise, SVN will have to waste disk representing frivolous reorderings of mysqldump, which you don't care about.

But eventually you have to simply discard data. One interesting approach I've seen went by the name of "logarithmic backups". The idea is that newer backups are more important than older backups, so you save more of them and expire most of them as they age. So you end up with

7 daily backups from the past week
12 monthly backups from the past year
1 annual backup from years prior.

This is a similar approach to RRDtool, where data is consolidated into a representative object. You wind up with 20+ backups total, and the ability to recover short lived data from the recent past and long-lived data from the distant past.

Actually answering the question

Since your data is relatively small, and probably doesn't change much, SVN might not be a bad approach.

I have a similar process for putting websites of interest into SVN, which I've modified to suit your needs. Put this in your cron.daily or whereever, and it'll get you done. You will need to initialize the repo first, and tweak it to suit your needs, but this is a good start:

#!/bin/bash
# check out to temp dir
DIR=`mktemp -d`
cd $DIR

# check out repository
svn co $1 .

# dump db
mysqldump --order-by-primary -u root -pPASSWORD database_name

# if changed, commit
svn commit -m 'Nightly backup'
cd ..
rm -rf $DIR

Linker3000 · Answer 5 · 2011-01-04T14:00:39+08:00

Linker3000

2011-01-04T14:00:39+08:002011-01-04T14:00:39+08:00

Hows about doing an automated dump of the backup with automysqlbackup and then creating auto-managed full/diffs with BackupPC?

BackupPC hard-links between non-changed files within different backups to minimise disk space usage.

http://sourceforge.net/projects/automysqlbackup/

http://backuppc.sourceforge.net/

2

Joe · Answer 6 · 2011-03-23T08:39:38+08:00

Joe

2011-03-23T08:39:38+08:002011-03-23T08:39:38+08:00

Dumping your database to a file and the committing that file to svn is a fine approach. There are a couple of things I recommend changing though... You should overwrite the same file every time you dump (remove the date from the filename) and remove the zip.

Just like editing your source code, you are essentially modifying the database dump file. Also, as someone else mentioned, if you don't zip it, in the long run it will compress much better on the server because the diffs will be much smaller.

The great part about this approach is that it ties together the database with the code at a particular time. For example, you need to run revision 1428. No problem... update your local repo to rev 1428 and you have all of the code and the database dump that runs with 1428. Load that dump, compile your code, and you're in business.

2

Automate the backup of my databases and files with cron

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?