I am running a LAMP website on Linux 10.0.4 LTS. I am a newbie sysadmin (I am a developer though) and I am seeking some advice on how best to implement backup for my website. My database is mySQL and ALL my database tables use the InnoDb database engine.
These are the requirements for the backup I want to implement:
Incorporates incremental and full backups of mysql database I would like to have hourly incremental backups, but also daily, weekly and monthly backups But it is not clear in my mind what rotation to use for these various backup datasets - and also how to manage them (and more importantly, how to restore the database from a set of full/incremental backups for a date)
I would like to compress and encrypt the data, so that I can store it remotely (Amazon S3)
I want to have this fully automated (i.e. run as a cron job).
Note: My server is 'headless' in that it has no X windowing or other GUI installed, so I am thinking of implementing the backup us a bash script. Alternatively, if there is software out there that can help me run these kind of backup, then it needs to be able to be run from the command line.
The items I need to backup are:
- mysql database
- website files (in a specified folder. Ideally, I would like to leave out some files that are autogenerated)
- configuration files
- miscellaneous data files in different folders
Here are my questions:
Is there existing software out there that I can use to do this, or do I need to write my own (bash script)?
What is the recommended backup strategy to use (in terms of what is run hourly, daily, weekly etc), and how to restore the website from a particular point in time?
If I have to write my own bash script (being a bash scripting newbie as well), I will be grateful if someone could provide a skeleton script to help me get started.
[Edit]
symcbean: could you list what further information you need from me in order to give 'more tailored advice' ?. In terms of budget, lets just say that it zero. SO I am unable to forkout much more apart from the (dedicated) server hosting + the Amazon S3 storage. That is also, why I will need to either use open source software or write my own bash script using the available tools on Linux.
It is a new website, and initially, the backed up data is likely to be under 1 Gb, but I fully expect the data to be growing by at least around 100Mb a day. This very quickly adds up if I am doing full backups daily and sending the backup file(s) by wire to Amazon S3.
I suggested incremental backups because I want to save on the bandwidth costs (not to mention server load) associated with transmitting potentially, several Gigabytes of data every day.
Also, no one (so far), has explained how to rotate between the [hourly?], daily, weekly and monthly backups.
There is a lot of (wildly differing) information/opinion out there - regarding backups. I just want to know what is the recommended 'best practice' regarding my particular situation as described above.
If further info is required in order to be able to suggest a more 'tailored recommendation', let me know so I can provide the required information.
I'd strongly recommend not using mysqldump on your live system. Even with innodb tables, it will be difficult to get coherent backups from a running system.
As usual here, you've not provided much indication of the constraints in terms of access and budget nor a clear indication of what you are trying to achieve here.
I would recommend using mysql replication to maintain a hot standby database. But to get a consistent snapshot of the system you will need to switch off replication on the client, run mysqldump then switch on replication and keep the dump file in the full backup.
In terms of software - you've obviously been brought up in a MSWindows environment. Writing scripts is easy and all the tools for compressing, encrypting, naming and moving files about come as standard in a Linux distro - its just a question of how you use them. Having said that, my prefered software for backing up files is afio - which is not usually included in minimal installs (you will have tar, cpio, gzip, rsync, ssh). If you have a google for afio then you'll find lots of docs explaining its virtues compared with the default tools.
A backup is only ever any use if you know how and if you can restore it.
IMHO, incremental backups are a waste of time. Sure it made sense when the backup medium was expensive - that's not the case any more - storage is relatively cheap compared to the cost of your time and effort and the value of the data. The last thing you want when restoring a system isworking out what sequence of backups to restore to get a consistent image - and if you've got a failed backup in the set it can all go horribly wrong.
The best solution would be a hot-standby replication (using rsync for files and mysql replication for the db). Then create off-site images (over the network, to tape, dvd...) periodically from the stand by.
If you're really strapped for cash then the "hot standby" could easily exist on a second disk on the same box as the live site, but for preference I'd recommend a separate machine.
What we do at work aside from having a central backup server which we rotate drives regularly to an off site source. Our department has rsync setup and each web server has a key pair account setup. This way we have one box, which will connect to each of our web servers, do a mysqldump, and does an rsynch on the mysqldump along with the web directories we specify.
To recover: you can use rsynch to rebuild to a given day.
For incremental, its a matter of setting up the different cron jobs to perform at intervals you desire.
I can provide more details on this if you are interested. It was a in house built script.
I had a similar question pertaining to databases awhile back... you might want to review: How to restore just ONE mysql database from a collective backup
As for rsync, you might want to review the following site: http://www.sanitarium.net/golug/rsync_backups_2010.html