Hallo :-)
Once again I'm on the hunt for new knowledge, I'm a part-time DBA as part of my current job, and am particularly interested in any documentation or reading around doing bulletproof multi-master replication topologies - does anyone have any pointers?
In past roles I've implemented and supported such topologies but there are always 'catches' (auto_increment_increment, auto_increment_offset being two big ones) that you only spot at the last minute and which have serious potential to ruin your day.
With a largely InnoDB workload, what are the big problems with multi-master replication and how do you, as a skilled DBA, go about solving them? How does that picture change if you're storing things with MyISAM, or indeed, other storage engines now they're nice and pluggable, perhaps someone has experience with Infobright or another data warehouse?
Emphasis should be placed on recovery techniques for any proposed solution. How does a DBA effectively backup that topology, and how easy is the restore process? Is it bulletproof enough that you can stick a TCP-aware load balancer (hashing on source IP or similar) in front and have zero downtime (or damn close to...) in the event a MySQL master goes pop?
I have read and would highly recommend High Performance MySQL by Baron Schwartz, however, what I'm really after is a couple of really quality websites with all the points covered and links to more in-depth reading material as required. Who's got one of those handy? :)
Bonus brownie points to any solution which can have 'pools' of slaves hung off it for the odd application which has a particularly thrashy read workload.
Thank you very much.
Master-master replication is asynchronous, hence it will definitely break if you write to both servers at once.
Even if the auto-increments are working, any other unique index and many other situations can break it - it's too brittle to be used.
But it is possible to use master-master as PART of a HA solution, you just need to ensure that applications only ever write to one of the pair and in a "clean" failover situation, e.g. admin failing over, it waits for the slave to catch up before switching.
This is not extremely difficult in practice, but a bit inconvenient.
Your main other option is to use DRBD, which is also not massively difficult to set up - but in this case, the 2nd machine is not even usable as a read replica - it just sits there being a hot spare. DRBD synchronously replicates the underlying storage, so everything is written safely to both machines.
There are some applications which are specially designed to tolerate the multi-master problems - these need to be designed VERY carefully with that exact situation in mind - in which case, it's ok. You can't use applications not designed for it though.
auto-increment is not the only, or the main problem.
Obviously, you've tried circular replication, given your comments about autoincrementing. That would be my preferred option; you just need to remember to configure MySQL correctly.
You could look at Sequioa, which works by sitting in front of your mysql cluster and deals with executing your SQL on every server in the case of writes, and load balancing in the case of reads. It has a number of other features, like allowing different database backends. It's not simple, with operations requiring multiple steps, but you're asking it to perform a difficult problem, so it's no surprise that the solutions aren't easy either.