Ping a Specific Port

Question

Kendall

Asked: 2011-02-17 09:04:54 +0800 CST2011-02-17 09:04:54 +0800 CST 2011-02-17 09:04:54 +0800 CST

Linux HA cluster w/Xen, Heartbeat, Pacemaker. domU does not failover to secondary node

772

I am having the followig problem with an OenSuSE + Heartbeat + Pacemaker + Xen HA cluster: when the node a Xen domU is running on is "dead" the Xen domU running on it is not restarted on the second node.

The cluster is setup with two nodes, each running OpenSuSE-11.3, Heartbeat 3.0, and Pacemaker 1.0 in CRM mode. For storage I am using a LUN on an iSCSI SAN device; the LUN is formatted with OCFS2 and managed with LVM. The Xen domU has two logical volumes; one for root and the other for swap. I am using IPMI cards for STONITH devices, and a dedicated ethernet link for heartbeat communications.

The ha.cf file is as follows: keepalive 1 deadtime 10 warntime 5 udpport 694 ucast eth1 auto_failback off node dhcp-166 node stage use_logd yes crm yes

My resources look as follows: shocrm(live)configure# show node $id="5c1aa924-bba4-4f95-a367-6c9a58ac4a38" dhcp-166 node $id="cebc92eb-af24-4833-aaf0-672adf80b58e" stage primitive Xen-Util ocf:heartbeat:Xen \ meta target-role="Started" \ operations $id="Xen-Util-operations" \ op start interval="0" timeout="60" start-delay="0" \ op stop interval="0" timeout="120" \ params xmfile="/etc/xen/vm/xen-util" primitive my-stonith stonith:external/ipmi \ params hostname="dhcp-166" ipaddr="192.168.3.106" userid="ADMIN" passwd="xxx" \ op monitor interval="2m" timeout="60s" primitive my-stonith2 stonith:external/ipmi \ params hostname="stage" ipaddr="192.168.3.105" userid="ADMIN" passwd="xxx" \ op monitor interval="2m" timeout="60s" property $id="cib-bootstrap-options" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat"
The Xen domU config file is as follows:
name = "xen-util" bootloader = "/usr/lib/xen/boot/domUloader.py" #bootargs = "xvda1:/vmlinuz-xen,/initrd-xen" bootargs = "--entry=xvda1:/boot/vmlinuz-xen,/boot/initrd-xen" memory = 4096 disk = [ 'phy:vg_xen/xen-util-root,xvda1,w', 'phy:vg_xen/xen-util-swap,xvda2,w', ] root = "/dev/xvda1" vif = [ 'mac=00:16:3e:42:42:06' ] #vfb = [ 'type=vnc,vncunused=0,vnclisten=192.168.3.172' ] extra = ""

Say domU "Xen-Util" is running on node "stage"; if "stage" goes down, "Xen-Util" does not restart on node "dhcp-166". It seems to want to try as an "xm list" will show it for a few seconds and if you "xm console xen-util" it will give a message like "copying /boot/kernel.gz from xvda1 to /var/lib/xen/tmp/kernel.a53gs for booting". However, it never gets past that, eventually gives up, and no longer appears in "xm list". Now, when node "stage" comes back online after being power cycled, it detects that "Xen-Util" isn't running, and starts it (on stage).

I've tried starting "Xen-Util" on node "dhcp-166" without the cluster running, and it works fine. No problems. So, I know it works in that respect.

Any ideas? Thanks!

1 Answers

Voted

Kendall · Answer 1 · 2011-02-18T07:23:44+08:00

Best Answer

Kendall

2011-02-18T07:23:44+08:002011-02-18T07:23:44+08:00

I figured it out, after a bit more trial and error. I was getting iSCSI errors going back up the stack too rapidly, as mentioned in this post on ServerFault.

In addition to changing the variables outlined in the post above, I also traced some network cables and discovered that node #2 was on a 100Mb link, while node #1 was on a Gig link, along with the SAN. After some careful shuffling, all the network connections are now running at Gig speeds.

Finally, I changed the MTU on the Linux interfaces to 9000 from 1500, which seems to also have sped things up a bit.

The final result is a working cluster with the domU booting even quicker than before on node #1.

Cheers,

Kendall

0

Linux HA cluster w/Xen, Heartbeat, Pacemaker. domU does not failover to secondary node

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?