I have two different MySQL instances that I would like to 'slave' to a third instance. (so I can do easy joins on the third instance)
eg.
mysql1> show databases
db1
db2
mysql2> show databases
db3
db4
mysql3> show databases
db1
db2
db3
db4
I've looked at maatkit (percona toolkit) pt-table-sync but people say it can corrupt data. (It apparent removes and re-adds data to generate inserts)
pt-archiver sort of works for 'snapshots', but db1 is approximately 6GB and copying the whole thing over is a lot more data than really needed. The real time updates are only about 100MB a day.
The natural concept to me would be to allow mysql3 to run as a replica slave of BOTH mysql1 and mysql2, but that doesn't seem to be an option in MySQL
Tungsten Replicator seems to allow for this type of data sync, but it seems a bit unwieldy to configure and I'm concerned about reliability.
Does anyone have other solutions they've used for this problem?
By design, one mysqld process cannot simultaneously listen to two different Masters.
The CHANGE MASTER TO command only allows you to set one Master as a source to read.
In order to emulate this, you would have to alternate between the two Masters programmatically. How do you do that ?
I described in StackOverflow how to Connect a Slave Manually to different Masters where each Master was a Laptop and the Slave was a Central Computer.
Here is the basic idea
Setup Replication of M1 to S1 and then M2 to S1 like this
Each time you switch from one Master to another, you must record two values from
SHOW SLAVE STATUS\G
These two values represent the last SQL Statement that came from the Master and was is next to be executed on the Slave.
There is one major caution : As long as M1 and M2 are updating mutually exclusive databases, this algorithm should be just fine.
Believe it or not, I addressed a question like this in ServerFault back in May 2011. I actually explained how to emulate true multimaster/single slave using the BLACKHOLE Storage Engine based on the book "High Performance MySQL".
There isn't a good read-made tool to do this. pt-table-sync doesn't work exactly as you've been told (I wrote it ;) but it's not the right solution. I have seen its bidirectional syncing functionality used to reconcile servers to a central source after being intentionally disconnected and updated, but this isn't the right solution for what you need.
I don't normally give sales pitches on topics like this, but this is honestly a case where I'd engage Percona to develop a new tool for you. Some people have written little scripts that suit their personal scenarios, but a high-quality solution for general use doesn't exist yet, and isn't really that hard. The main thing is that you need formal testing, and Percona Toolkit's components are already 90% of the way to what you need -- it just needs a little bit of glue between them. You could do this yourself, of course, but why make a square wheel and end up maintaining it yourself and discovering all of its bugs when you least desire it?
That said (end sales pitch, sorry) -- the solution I'd suggest should avoid blackhole tables and go with simpler, less troublesome techniques. (Yeah -- I wrote High Performance MySQL, too. I know. Back then I hadn't seen as many problems with blackhole tables as I have today.) It should do roughly what Rolando suggests, but there are some subtleties. For example, it shouldn't let the I/O thread stream a bunch of data from the master, getting far ahead of the SQL thread, then throw it all away when it round-robins to the next server. That would be really wasteful and cause a lot of impact on the master. There are a lot of little details like this that need to be taken care of -- another that comes to mind is not switching masters when a replication-caused temporary table is in use.