I have been reviewing XtraDB clustering and produced a P.o.C. environment on Openstack using 4 instances, which has fallen over during my resilience testing.
Per the pxc documentation: http://www.percona.com/doc/percona-xtradb-cluster/howtos/virt_sandbox.html which covers a 3 node install I opted for a 4th.
- Initial setup complete data loading tests passed, with all nodes being updates synchronously using a 1.6GB test sql file to load a database.
- Failiure and restore of nodes commenced, this test entailed stoping the mysql service on a node, creating and subsequently dropping a database to test surviving node replication, and starting of the downed node to resync.
- This worked fine for nodes 4,3,2.
- Node1 which per the pxc documents is essentially a controller, would not rejoin the cluster.
So my questions are as follows:
- How to return a controller node to service if surviving nodes have since had data written to them
- Using 4 nodes as a reference, is there a way to remove this single point of failure in node1? (if a surviving node restarts with the controller (node1) down/out of sync, that node will also fail).
Based on your symptoms on node one, you are using
in your configuration file, which means the node will start a new cluster. You can confirm this with wsrep_cluster_size variable being 1 on node1, and 3 on the others. If you want to join node1 to your already existing cluster, you should specify
In this case, node1 will rejoin the cluster.
Some additional thoughts:
Because of the quorum mechanism in PXC (Percona Xtradb Cluster), it's not recommended to run it on 4 nodes. It's recommended to use an odd number of nodes, so in case of a network split, one part of the split cluster will be able have majority.
Instead of wsrep_cluster_address you can use wsrep_urls in the [mysqld_safe] section.
Disclaimer: I work for Percona.
Further research into this issue seems this is a viable method (leaving this answer unaccepted for the moment, incase someone replies with a better setup):
This setup appears to tollerate the loss of any node via disconnection at least, and on restoration of the node syncs without issue.
If Mysql wont start and the reason is a corrupt DB table.
replicate what the server is doing and grab a good copy from a stopped server for client dbs.
it tars the files from $MYSQLHOME that are db via a nc.
we used scp to move the good files in place and then kicked off the sync again by starting mysql of the bad server.