Ping a Specific Port

Question

William Hilsum

Asked: 2014-07-12 07:32:55 +0800 CST2014-07-12 07:32:55 +0800 CST 2014-07-12 07:32:55 +0800 CST

Importance of ha.cf file in a heartbeat/pacemaker environment?

772

I'm having a few issues trying to understand ha.cf and how the cluster picks up on updates.

For example, when creating a new cluster, I usually:

Set some default options in ha.cf on node 1 - node x
Start the cluster.
Run crm on any node, configure resources.

Whilst I usually do nodes up/down, resources up/down, I have never actually added a new node at a later date.

Just for "fun", I decided to run a new server that only specified one node in the cluster in it's ha.cf, and then start heartbeat.

This machine successfully joined the cluster and added itself to every other node in the cluster.... Where I get confused is that even if I shutdown all nodes, and reboot the original 2 nodes, they both still have the third server as in the cluster but offline, despite the third not being in the original 2 node's ha.cf file.

Even if I edit ha.cf and change some nonsense value/or touch the file, reboot the server and cluster, it is still there. So my conclusion is that CIB takes preference over ha.cf, but, what I don't get is why/how.

I'm really looking for best practices - should any machine just have enough in ha.cf to "get it up", then do everythign in CRM? Is ha.cf a waste of time, or should I be using it a lot more?

Trying not to be so vague - I'm really just looking for what I should be doing in CRM, and what I should be doing in ha.cf?

Thanks,

Wil

2 Answers

Voted

MadHatter · Answer 1 · 2014-07-23T01:40:28+08:00

MadHatter

2014-07-23T01:40:28+08:002014-07-23T01:40:28+08:00

I was really hoping to see a good answer myself.

All I can really do is endorse your experiences: that the only real function of heartbeat in these circumstances is to start pacemakerd, the CRM subsystem. This (as you know ) maintains its own database of nodes and state, which on my systems is /var/lib/heartbeat/crm/cib.xml. The files in /etc/ha.d inform heartbeat, but not crm.

I am running a number of failover pairs doing various things, most of which have been up for over 500 days and some of which are close to 1000 days, and most of which have survived any number of failovers and failbacks; so I can only assume I'm doing something right. My practice is not to actually lie in ha.cf, but to put almost nothing in there other than what is required to get HA to start up CRM.

I'm sorry I don't have anything more concrete to point you at.

0

Alexis-Emmanuel Haeringer · Answer 2 · 2014-07-23T03:31:42+08:00

Apparently, you run Pacemaker, a Cluster Resource Manager, on top of Heartbeat v3, a cluster messaging layer. You may find more info here. For instance, older versions of Heartbeat have required users to add ping node configuration to ha.cf, this is no longer required with pingd ressource agent in Pacemaker.

The role of a resource agent is to abstract the service it provides and present a consistent view to the cluster, which allows the cluster to be agnostic about the resources it manages. The cluster doesn't need to understand how the resource works because it relies on the resource agent to do the right thing when given a start, stop or monitor command.

So you should distinguish the configurations and check the following in your

/etc/ha.d/ha.cf

mcast ...
bcast eth..
#disables automatic joining <== Do you have "autojoin any", here ?
autojoin none
node node1 node2
# for enabling Pacemaker under Heartbeat 3.04
pacemaker respawn
#and check manpage to track deprecated directives (baud, auto_failback, stonith, etc.)

Let me also suggest the following tests:

Do you reread the good heartbeat service?

kill -HUP $GoodHeartbeatPID
CRM need a commit (cib.xml (aka. Cluster Information Base) is generate by this command)

crm_verify -L -V

cib commit $yourconf
Check also your hosts /etc/hosts, DNS etc.

Be careful with restart order

on your still-active node. This will shutdown your cluster resources.

 /etc/init.d/heartbeat stop

on your standby node (the one where you created your CIB). This will start the local Heartbeat instance and Pacemaker, and wait for other cluster nodes to check in.

/etc/init.d/heartbeat start

on your the other node. This will start the local Heartbeat instance and Pacemaker, fetch the CIB automatically, and start applications.

/etc/init.d/heartbeat start

Kind regards

Importance of ha.cf file in a heartbeat/pacemaker environment?

Let me also suggest the following tests:

Be careful with restart order

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?