Ping a Specific Port

Question

Krishna Kumar R

Asked: 2014-12-25 13:52:47 +0800 CST2014-12-25 13:52:47 +0800 CST 2014-12-25 13:52:47 +0800 CST

pacemaker node is UNCLEAN (offline)

772

I am following the http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_verify_corosync_installation.html document for setting up a 2 node cluster in AWS. The two nodes have pacemaker installed and FW rules are enabled. When I run the pcs status command on both the nodes, I get the message that the other node is UNCLEAN (offline).

The two nodes that I have setup are ha1p and ha2p.

OUTPUT ON ha1p

[root@ha1 log]# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Wed Dec 24 21:30:44 2014
Last change: Wed Dec 24 21:27:44 2014
Stack: cman
Current DC: ha1p - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured


Node ha2p: UNCLEAN (offline)
Online: [ ha1p ]

Full list of resources:

OUTPUT ON ha2p

[root@ha2 log]# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Wed Dec 24 21:30:44 2014
Last change: Wed Dec 24 21:27:44 2014
Stack: cman
Current DC: ha2p - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured


Node ha1p: UNCLEAN (offline)
Online: [ ha2p ]

Full list of resources:

Contents of /etc/cluster/cluster.conf is as below:

[root@ha1 log]# cat /etc/cluster/cluster.conf

<cluster config_version="9" name="mycluster">
  <fence_daemon/>
  <clusternodes>
    <clusternode name="ha1p" nodeid="1">
      <fence>
        <method name="pcmk-method">
          <device name="pcmk-redirect" port="ha1p"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="ha2p" nodeid="2">
      <fence>
        <method name="pcmk-method">
          <device name="pcmk-redirect" port="ha2p"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman expected_votes="1" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_pcmk" name="pcmk-redirect"/>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

Any help would be much appreciated.

5 Answers

Voted

user450413 · Answer 1 · 2018-01-03T17:03:30+08:00

user450413

2018-01-03T17:03:30+08:002018-01-03T17:03:30+08:00

Yes, you need to make sure the hostname you are using in your cluster definition is NOT the hostname in the 127.0.0.1 line in /etc/hosts.

So, my /etc/hosts looks like this:

127.0.0.1   cluster-node1 domain.com localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.0.1     node1
192.168.0.2     node2

4

c4f4t0r · Answer 2 · 2014-12-25T14:48:43+08:00

c4f4t0r

2014-12-25T14:48:43+08:002014-12-25T14:48:43+08:00

This happen because your cluster doesn't have full stonith configuration, In unclean state mean the cluster doesn't know the state of the node.

2

kxu · Answer 3 · 2017-11-09T19:55:31+08:00

kxu

2017-11-09T19:55:31+08:002017-11-09T19:55:31+08:00

Maybe you can edit /etc/hosts file, and remove lines which contains 127.0.0.1 and ::1 (lines that mentions localhost). I have this exact problem and I tried using this method, and solved the problem.

2

Mircea Vutcovici · Answer 4 · 2019-03-21T11:27:59+08:00

Mircea Vutcovici

2019-03-21T11:27:59+08:002019-03-21T11:27:59+08:00

The error:

Node ha2p: UNCLEAN (offline)

Means that corosync could not contact the other corosync services running the other cluster nodes.

How to fix:

check on which IP is listening and make sure that IP is on an external interface like eth0 and not on loopback interface: ss -tulnp|egrep ':5405.*corosync'
make sure the IP version is IPv4 or IPv6 as you expected. You can force IPv4 by adding ip_version: ipv6 to totem section in /etc/corosync/corosync.conf file.
check firewall rules
check other networking related issues. Make sure that IP of the nodes are reachable between themselves.
use getent ahosts $HOSTNAME to see how current host name is resolved.

2

mayur murkya · Answer 5 · 2021-07-09T22:16:03+08:00

mayur murkya

2021-07-09T22:16:03+08:002021-07-09T22:16:03+08:00

I got the same issue and the reason was a time mismatch between both the nodes. Post ntp/chrony sync, the issue got resolved.

0

pacemaker node is UNCLEAN (offline)

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?