I'm looking into ways to balance the load on our MySQL infrastructure, and can't seem to find an answer that works for me... :)
So, I have one big and fat server, that handles everything. Many DBs, many reads, many writes, etc. It handles it pretty well, but it is a single point of failure.
We've set up a couple of slaves to redirect reads to them, but faced 2 problems: it takes a lot of effort to rebuild all programs to split reads and writes; and sometimes slaves get behind, which leads to very interesting artifacts in the application.
Problems with slaves getting behind: because many databases are mixed - there are both heavy 10-20 minute queries done on data mining side, as well as atomic queries that take no time. But Slave runs one query at a time, so all atomic queries have to wait until heavy one finishes.
To resolve these 2 problems, I was thinking about something like a proxy, that would consider this:
- split read/writes automatically
- serve as a single point of entry and then redirect request to appropriate server that has needed database (e.g. separate db1 and db2 on the back-end, but have it transparent to the application)
- be aware of slave lag, and send reads to the master, when slave lag occurs (would be ideal if this can be done, say, per database; but server-wide would be pretty awesome as well)
- load balance reads between all eligible slaves (either by simple round robin, or by monitoring LA)
One problem that still remains, but which I want to consider - is fail-over. If master fails - would be nice if slave would take responsibilities of a master, and when master is back up - it would become a slave.
Any pointers to RTFM or case studies on this subject would be welcome =)
EDIT: Googled some more, and in addition to Tungsten enterprise - found dbShards, and Schooner. While looking deeper into that - does anyone have experience with these solutions? Any feedback?
Check out Tungsten Enterprise
MySQL-MMM is not a recommended solution. See: http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/ Even one of the original authors agreed in the comments.
Cheers
Edit: oops, second link was same as the first. Corrected
How about two servers acting as your application databases and another server (or two?) acting as your data mining tier. Your two 'application' databases shouldn't get (too far) behind because they aren't handling any of the heavy queries. You can also setup mysql-mmm to handle the failovers. Then you can point your application to the Mysql VIP that you've setup in your load balancer. This VIP has two MMM IPs in it. But MMM might say 'hey, box 2 is too far behind, I'll move that IP to box 1 so it can catch up.'
Then you'll have two tiers of databases handling your two different query types. The caches on these machines will be optimized for their query types (in theory.)
Also look into a Fusion-IO card or Virident TachIOn. That might solve the problems without adding a bunch of hardware.
To address replication lag we have implemented SSDs as primary storage on the slave. The faster write times help the slave database keep up despite replication on the slave being a single threaded operation.