I have 2 nodes : - patroni1 : 192.168.1.38 - patroni2 : 192.168.1.39
and Virtual IP : 192.168.1.40
I have HA-Proxy installed on both.
Here is my pcs status when VIP is attached to patroni2 and haproxy is activated on patroni2
-----------
[root@patroni1 ~]# pcs status
Cluster name: haproxy_cluster
Stack: corosync
Current DC: patroni2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Nov 29 21:29:00 2018
Last change: Thu Nov 29 21:24:52 2018 by root via cibadmin on patroni1
2 nodes configured
4 resources configured
Online: [ patroni1 patroni2 ]
Full list of resources:
xen-fencing-patroni2 (stonith:fence_xenapi): Started patroni1
xen-fencing-patroni1 (stonith:fence_xenapi): Started patroni2
Resource Group: HAproxyGroup
haproxy (ocf::heartbeat:haproxy): Started patroni2
VIP (ocf::heartbeat:IPaddr2): Started patroni2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@patroni1 ~]# pcs resource show VIP
Resource: VIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=24 ip=192.168.1.40
Operations: monitor interval=1s (VIP-monitor-interval-1s)
start interval=0s timeout=20s (VIP-start-interval-0s)
stop interval=0s timeout=20s (VIP-stop-interval-0s)
[root@patroni1 ~]# pcs resource show haproxy
Resource: haproxy (class=ocf provider=heartbeat type=haproxy)
Attributes: binpath=/usr/sbin/haproxy conffile=/etc/haproxy/haproxy.cfg
Operations: monitor interval=10s (haproxy-monitor-interval-10s)
start interval=0s timeout=20s (haproxy-start-interval-0s)
stop interval=0s timeout=20s (haproxy-stop-interval-0s)
-----------
My problem is : fencing is not triggered whenever I manualy kill haproxy on patroni2. fencing is only triggered when I manualy halt or reboot patroni2.
here is pcs status when I manualy kill haproxy
------------
[root@patroni1 ~]# pcs status
Cluster name: haproxy_cluster
Stack: corosync
Current DC: patroni2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Nov 29 21:37:37 2018
Last change: Thu Nov 29 21:24:52 2018 by root via cibadmin on patroni1
2 nodes configured
4 resources configured
Online: [ patroni1 patroni2 ]
Full list of resources:
xen-fencing-patroni2 (stonith:fence_xenapi): Started patroni1
xen-fencing-patroni1 (stonith:fence_xenapi): Started patroni2
Resource Group: HAproxyGroup
haproxy (ocf::heartbeat:haproxy): Started patroni2
VIP (ocf::heartbeat:IPaddr2): Starting patroni2
Failed Actions:
* haproxy_monitor_10000 on patroni2 'not running' (7): call=38, status=complete, exitreason='',
last-rc-change='Thu Nov 29 21:37:36 2018', queued=0ms, exec=0ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
------------
How to make fencing trigered when HA-Proxy not responding ?
Sincerely -bino-
What you're observing is the expected behavior. Just because a resource is stopped, doesn't mean the best course of action is to forcefully power-cycle the system.
You manually kill HA-Proxy, Pacemaker detects that this service is for some reason not running, and logs this failure:
haproxy_monitor_10000 on patroni2 'not running' [...]
. The cluster then restarts this service. Which I would assume worked successfully as the cluster now shows the service is running without issue on the very same patroni2 node.A monitor operation failure is not considered fatal, and as such it will not escalate to a STONITH action. A failure on a stop operation is considered a fatal failure though. If the cluster can't stop the resource, how can it restart it, or failover? By fencing the node, and power-cycling it via STONITH.