I'm working with a production MySQL database and I would like to:
- Create a read-only slave copy of the database that gets updates from the master. The master is the only one that can be written onto.
- Have "non-permanent write" access to the slave copy of the database. By "non-permanent", I mean that users can write to the database and change stuff but by sometime, say, tomorrow, all the changes will be gone and everything will be a exact duplicate of the master.
QUESTION 1:
Is this possible? I know (1) is possible because there's some documentation here http://dev.mysql.com/doc/refman/5.0/en/replication.html. But I'm not sure I can get both (1) and (2). I'm asking because I need to write code that accesses the database and since it's a production database used by everyone, there are times that I cannot use the database. I would like to write test programs that use the replicated slave (with non-permanent write access) and when I make sure everything is fine, I will run my program using the master database so the changes are permanent. I would like my writes to show up in the slave so I can make sure the right things are written.
QUESTION 2:
Is it possible to replicate the database in pieces easily? The database has a table that is really huge. I think the database is on the order of 4GB. I can only do the replication over the weekend so I don't disturb everyone. But I'm afraid that one weekend's time is not enough to replicate the whole database. So, is there a way to do it piece by piece? Or is there some super fast way to replicate a database?
QUESTION 3:
How complicated will it be to setup everything up? I would imagine it will take me a long long time due to inexperience, but suppose I got a DBA to do the work for me, how much trouble am I causing him/her?
Any help will be greatly appreciated! Btw, I very new to MySQL and databases in general so be gentle :)
We use a situation like this @ our office for test/backup/production.
Our situation is like this:
To facilitate the last part, first we lock the local slave ("FLUSH TABLES WITH READ LOCK;"), then we simply use linux LVM to make a snapshot of the data store of the local slave replica (so we have a consistent on-disk snapshot). We then use rsync to copy from the local slave's snapshot on top of the dar dir on the test server. This has the side effect of keeping new tables that haven't been pushed to Production around so ongoing projects with new functionality are not blown away.
This works fine with MyISAM tables, which is what we're using, we haven't switched to InnoDB. I'm afraid I"m not sure if the rsync part would work right with InnoDB. If your using InnoDB, you probably can get away with mysqldump >dumpfile.sql, and mysql < dumpfile.sql on the test server.
Question 1:
On Linux, you can use LVM writeable snapshots to create frozen images of your replica which you can then change to your heart's content without affecting your true replica.
This might work something like this:
On the replica:
You can then do whatever you like to the throwaway replica, and then, once you have finished with it, stop the instance and remove the snapshot.
Question 2: I am not clear what you mean. Normally, MySQL replication runs continuously keeping your replica up-to-date with almost no impact on the master.
Question 3: MySQL is very easy to administer. LVM can take a bit of getting used to, and might require extra disk space.
Shlomo - some of your terminology is confusing. For databases, the term "replication" tends to mean a mechanism for keeping a remote copy of a database up to date. This needs the replication master to track changes to data - which MySQL does with a binary log containing whatever updates, inserts, deletes have been applied. The replication slave needs to obtain these logged changes and then apply these to its own dataset.
Initialising a slave can be tricky. You need a dump containing a snapshot of the master database at a given instant in time - which most database dump mechanisms will give you - and the point in the replication logs that the next update/delete/insert will be recorded - again, many database dump processes can include this replication point in the dump. You then restore your replication slave using the dump, and configure it to replicate from the replication point that the dump was created.
Creating the dump file can be intrusive: for MyISAM tables, you need to either stop the MySQL server and copy the contents of datadir into you dumpfile, or lock all the tables to be included in the dump and either copy the raw table files from datadir or mysqldump the required tables, holding the lock for the duration. While the lock is held, updates to the locked tables will be stalled for the duration of the dump process - which can be intrusive.
For Innodb, mysqldump can be used to generate a dump without too much impact on your regular database users. Internally, it will generate a snapshot of the data being dumped enabling a consistent dataset to be included in the dump even while users are updating the same tables. This will have a performance impact, and will also require space for maintaining the snapshot - about the equivalent size of the rows that are changed by other users while the dump is being generated. Obviously, the more changes, the larger the disk space used. Once the dump is completed, space used by holding the snapshot is returned.
Once your replica is built, the impact on the master of maintaining replication to the slave is miniscule - typically only a few percent. All the real work is done by the slave updating its dataset as changes are pulled through the replication logs. Using the built-in MySQL replication feature, most users are able to keep replication running all day without significantly impacting their master database.
If you only want the slave to be pulling changes across, then all you need to do is start and stop replication on the slave at the start and end of the period. As long as your master is not super busy, the slave will normally catch up to the current state quite quickly.
The administrative overhead is maintaining the replication logs on the master. There must be enough space on the disk to hold at least a few days worth of logs, preferably a few weeks worth. The longer the replication logs are held, the longer your dumps stay viable for use in initialising replicas. If the replication logs associated with your dump have been purged, then your dump cannot be ued to build a replica. MySQL can be configured to remove logs once they have passed a certain age threshold, so no log rotation scipts are required.
The normal pattern is that once you have a good replica built, you use that to take dumps - thereby avoiding any impact on the master server at all. The replica that you take dumps from should have "log-slave-updates" enabled - so that it in turn maintains a replication log of every change made to it directly (which is a bad idea because then your replica will deviate from the master) and every change that has come through the replication stream - which by default it will not add to the replication log.
I suggest that you play around with some test machines and set up replication and see what happens. It is very easy and quick to do once you have the hang of it. Also, look at the replication filtering options that are available - which can be useful if you only want to replicate a subset of the master database. These are in the MySQL manual:
16.2.3. How Servers Evaluate Replication Filtering Rules