My organisation has been looking into how to spread our servers around geographically while keeping backups very up to date, and ideally spreading the load.
The initial thing I have in mind is Rails on MySQL. The write rate isn't too high (articles/comments being left at less than 1 per minute, though some have large media attachments).
So,
- does MySQL replication work well across wide area networks?
- Does the connection (or a slave server) going down mean that manual intervention is required (once the two servers can talk to each other again) or is recovery automatic?
- If the master disappears, what is required to turn a slave into a master? Are there standard scripts/tools to help manage that?
- Any other gotchas etc?
We use replication across datacenters in several European countries (so they aren't across the world from each other, but they are certainly not local) and it works without any problem.
Replication will automatically restart if possible. If there is a problem with a query (e.g. a database is present on the master and not the slave, and a query uses it), then it will require manual correction by default (but you can set it to ignore such errors). If the databases are exact mirrors, then you should never need to manually restart replication.
If you have two servers and the master disappears, then to turn the slave into the 'master', just stop replication and alter your code (to write to the new 'master'). If you have three or more servers and the master disappears, then stop replication on the slaves, change them to use the new master, and start again. If they aren't exactly in sync (depends how much data is being transferred, how busy the servers are, how good the network connection is, etc), then you might have to do more work than that. The replication section of the MySQL documentation covers this in more detail.
I would suggest that you ensure that you are replicating over SSL (i.e. set the replication user to require a SSL connection).
Replication changed dramatically in MySQL 5.1. In 5.0 only Statement Based Replication was used. You now have the option to do Row Based Replication or Mixed Based Replication. This will greatly affect how you replicate over a WAN.
If you have the ability to either: A) Do IP take over (if your servers are geographically separated this is not likely) B) Make agile DNS changes You can avoid modifying the app code/configuration to change the masters. We use internal DNS with short caching and fake .internal domains. If we need to change masterdb.internal to be some other server, in 5 seconds the change propigates.
Within a single data center we use IP take over. All of the DB servers have virtual interfaces (eth0:1, eth0:2, eth0:3) which are not ifup'ed on boot. If one of the slaves needs to take over, you just ifup eth0:2 and it's the master. In this scenario, eth0 is the 'if' that we use to shell in and such. The apps connect on eth0:1 which will not be activated on boot if my script detects that the IP is taken. (wikipedia STONITH) The other ifs are for taking over the IPs of masters that may need to be failed over.
I would not recommend to cross the oceans when using a MySQL replication. I tried once to replicate from a master in europe while the slave was in texas. The replication broke almost every day until we did abandon this project. Of course it can work, but it tends to get more fragil the bigger the distance between the master and the slave are.