Ping a Specific Port

Question

Chris Adams

Asked: 2011-05-27 06:43:57 +0800 CST2011-05-27 06:43:57 +0800 CST 2011-05-27 06:43:57 +0800 CST

Automated failover strategy for master-slave Mysql replication - why would this not work?

772

I'd like some feedback about this failover strategy for a couple of MySQL servers I'm speccing up for a cluster, and I want to check if there's something obvious I'm not missing here.

One app server that connects to a mysql master server in day to day operations, and that has a mysql-server set up as a slave to it for master-slave replication purposes.

If the mysql server fails, I want to the web app to try connecting to the master, and then after n failed attempts, perform the following:

assume the master will no longer be available
send a signal to the slave server to stop replication
send a signal to the slave server to tell it to act as the new mysql master
begin connecting to the server again, and treat it like the master from now on

Once the app is up again, and serving users, I'd like to be able to spin up a new slave server in the background, once it's ready to serve requests, set up master slave replication once again to provide the same failover support as before.

I'm pretty sure this has been done before, but I can't see any guides on this, so I'm assuming there must be some obvious reason you wouldn't try this, that I haven't thought of yet.

What are the pitfalls of taking this approach for providing automated failover like this with MySQL?

As an aside, I'm aware of master-master replication, but a) I've seen it go horribly wrong, and it b) seems worryingly over-complicated.

Thanks

3 Answers

Voted

Frands Hansen · Answer 1 · 2011-05-27T08:00:32+08:00

Frands Hansen

2011-05-27T08:00:32+08:002011-05-27T08:00:32+08:00

What you want is a High Availability cluster and I think that your suggested approach seems a bit strange.

A good way to achieve this is creating a Linux HA cluster and sync your MySQL using DRDB sync on filesystem level.

In such a setup you have 3 things:

The Cluster Messaging Layer (Linux-HA or CoroSync)
The Cluster Resource Manager (Pacemaker)
The disk sync (DRDB)

Instead of making a lot of code in your application you use a virtual IP address that you move around to the current active node. Also you use STONITH (Shoot The Other Node In The Head (I did not make this up)) to make sure that the first node is actually dead before trying to take over the resources.

There's some great material to read on these links: http://www.linux-ha.org/wiki/Main_Page http://www.clusterlabs.org/wiki/DRBD_MySQL_HowTo http://theclusterguy.clusterlabs.org/

3

RolandoMySQLDBA · Answer 2 · 2011-05-27T18:28:13+08:00

Best Answer

RolandoMySQLDBA

2011-05-27T18:28:13+08:002011-05-27T18:28:13+08:00

The reason automatic failover is not conducive has to do with replication lag. If the slave happens to be behind and failover occurs, you may be writing updates with keys that do not exist yet because the inserts from the master has not been written yet. The more replication lag, the more this is a problem. At my company we use DRBD for automatic failover since the DRBD server you failover to is an exact disk level copy of the original master. As a policy, we do manual for failover of master/slave and master/master setups.

2

Mark Wagner · Answer 3 · 2011-05-27T08:31:06+08:00

Mark Wagner

2011-05-27T08:31:06+08:002011-05-27T08:31:06+08:00

I wouldn't rule out master-master replication. In fact what you describe is almost master-master replication.

Have a look at MMM (Multi-Master Replication Manager for MySQL). http://mysql-mmm.org/ It works at the mysql layer so it works much better than OS-based clustering.

0

Automated failover strategy for master-slave Mysql replication - why would this not work?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?