We are expanding our network of webservers on EC2 to a number of different regions and currently use master/slave replication. We've found that over the past couple of months our slave has stopped replicating a number of times which required us to clear the db and initialise the replication again.
As we're now looking to have servers in 3 different regions we're a little concerned about these MySQL replication errors. We believe they're due to auto_increment
values, so we're considering a number of approaches to quell these errors and stabilise replication:
- Multi-Master replication; 3 masters (one in each region), with the relevant
auto_increment
offsets, regularly backing up to S3. Or, - MySQL-Cluster; 3 nodes (one in each region) with a separate management node which will also aggregate logs and statistics.
After investigating it seems they both have down-sides (replication errors for the former, performance issues for the latter).
We believe the cluster approach would allow us to manage and add new nodes more easily than the Multi-Master route, and would reduce/eliminate the replication issues we're currently seeing. But performance is a priority.
Are the performance issues of MySQL-Cluster as bad as people say?
If you understand your data and the reasons why replication fails you shouldn't need to reload slaves, though this is often the easiest way as it can be tedious fixing the slave so that replication can continue.
I've used multiple masters for a few years, and the issues I have encountered are mainly caused by triggers which are really difficult to debug. For that reason, I would make the transactions as atomic and deterministic as possible, and to lock the user or session to the most local db.
I use a lookup table of available dbs indexed by a modulus of the decimalised ip address of the user, so that user always sees one database and is not affected by replication lag caused by db hopping, as mysql-proxy used to do.
Connect as a ring over ssh tunnels or encrypted vpn and it should just all work.
Inserting another db into the ring is straightforward, but if there is any replication lag, removing one runs a risk of an expensive loop, potentially inserting or updating records millions of times before being noticed!
I would go for replication - it works well when it works. You can hang more slaves of each master for reporting and point-in-time backups with no downtime.
MMM brings more problems than it solves, trust me.
MySQL Cluster: I don't know about the new 7.2, but for what I understand you cannot have them too far away. It's a new feature of 7.2 to be able to use nodes in VM's so you can get a grasp of how timing can kill everything.
Improving database performance does not justify the number of new problems. I recommend keep single server with few CPUs and hardware RAID or SSD.
MMM (Multi-Master Replication Manager for MySQL) saves a lot of times.