Ping a Specific Port

Question

Oneiroi

Asked: 2012-06-29 07:25:25 +0800 CST2012-06-29 07:25:25 +0800 CST 2012-06-29 07:25:25 +0800 CST

Percona XtraDB Cluster node recovery

772

I have been reviewing XtraDB clustering and produced a P.o.C. environment on Openstack using 4 instances, which has fallen over during my resilience testing.

Per the pxc documentation: http://www.percona.com/doc/percona-xtradb-cluster/howtos/virt_sandbox.html which covers a 3 node install I opted for a 4th.

Initial setup complete data loading tests passed, with all nodes being updates synchronously using a 1.6GB test sql file to load a database.
Failiure and restore of nodes commenced, this test entailed stoping the mysql service on a node, creating and subsequently dropping a database to test surviving node replication, and starting of the downed node to resync.
1. This worked fine for nodes 4,3,2.
2. Node1 which per the pxc documents is essentially a controller, would not rejoin the cluster.

So my questions are as follows:

How to return a controller node to service if surviving nodes have since had data written to them
Using 4 nodes as a reference, is there a way to remove this single point of failure in node1? (if a surviving node restarts with the controller (node1) down/out of sync, that node will also fail).

3 Answers

Voted

Peter Boros · Answer 1 · 2012-07-06T11:33:54+08:00

Best Answer

Peter Boros

2012-07-06T11:33:54+08:002012-07-06T11:33:54+08:00

Based on your symptoms on node one, you are using

wsrep_cluster_address=gcomm://

in your configuration file, which means the node will start a new cluster. You can confirm this with wsrep_cluster_size variable being 1 on node1, and 3 on the others. If you want to join node1 to your already existing cluster, you should specify

wsrep_cluster_address=gcomm://(ip of a running node here)

In this case, node1 will rejoin the cluster.

Some additional thoughts:

Because of the quorum mechanism in PXC (Percona Xtradb Cluster), it's not recommended to run it on 4 nodes. It's recommended to use an odd number of nodes, so in case of a network split, one part of the split cluster will be able have majority.
Instead of wsrep_cluster_address you can use wsrep_urls in the [mysqld_safe] section.

Disclaimer: I work for Percona.

6

Oneiroi · Answer 2 · 2012-06-29T08:16:34+08:00

Oneiroi

2012-06-29T08:16:34+08:002012-06-29T08:16:34+08:00

Further research into this issue seems this is a viable method (leaving this answer unaccepted for the moment, incase someone replies with a better setup):

Circular setup
1. per pxc documentation have all nodes sync from node 1
2. stop node2 repoint to node3, start node 2
3. stop node3 repoint to node4, start node 3
4. stop node1 repoint to node2, start node 1

This setup appears to tollerate the loss of any node via disconnection at least, and on restoration of the node syncs without issue.

1

steve · Answer 3 · 2013-04-07T02:07:52+08:00

steve

2013-04-07T02:07:52+08:002013-04-07T02:07:52+08:00

If Mysql wont start and the reason is a corrupt DB table.

replicate what the server is doing and grab a good copy from a stopped server for client dbs.

it tars the files from $MYSQLHOME that are db via a nc.

we used scp to move the good files in place and then kicked off the sync again by starting mysql of the bad server.

-3

Percona XtraDB Cluster node recovery

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?