I'm creating a 2+1 failover cluster under Red Hat 5.5 with 4 services of which 2 have to run on the same node, sharing the same virtual IP address. One of the services on each node (called disk1 and disk2 in cluster.conf below) needs a (SAN) disk, the other doesn't (they are called nodisk1 and nodisk2). So on each node there should be one service needing a disk (diskN) and its corresponding service which doesn't need a disk (nodiskN). I'm using HA-LVM.
When I shut down (via ifdown) the two interfaces connected to the SAN to simulate SAN failure, the service needing the disk is disabled, the other keeps running, as expected. Surprisingly (and unfortunately), the virtual IP address shared by the two services on the same machine is also removed, rendering the still-running service useless. How can I configure the cluster to keep the IP address up? The only way I found so far was to assign a different virtual IP address to each of the service not needing a disk (not implemented in the following cluster.conf).
cluster.conf looks like this:
<?xml version="1.0" ?>
<cluster config_version="1" name="cluster">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<cman shutdown_timeout="10000"/>
<clusternodes>
<clusternode name="node1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="device1"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="device2"/>
</method>
</fence>
</clusternode>
<clusternode name="node3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="device3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_ilo" ipaddr="10.0.24.101" login="admin" name="device1" passwd="password"/>
<fencedevice agent="fence_ilo" ipaddr="10.0.24.102" login="admin" name="device2" passwd="password"/>
<fencedevice agent="fence_ilo" ipaddr="10.0.24.103" login="admin" name="device3" passwd="password"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="domain1" nofailback="0">
<failoverdomainnode name="node1" priority="1"/>
</failoverdomain>
<failoverdomain name="domain2" nofailback="0">
<failoverdomainnode name="node2" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.0.24.111" monitor_link="1"/>
<ip address="10.0.24.112" monitor_link="1"/>
</resources>
<service autostart="1" exclusive="0" name="disk1" recovery="restart" domain="domain1">
<ip ref="10.0.24.111"/>
<script file="/etc/init.d/disk1" name="disk1"/>
<fs device="/dev/VolGroup10/LogVol10" force_fsck="0" force_unmount="1" fstype="ext3" mountpoint="/mnt/lun1" name="lun1" self_fence="1"/>
<lvm lv_name="LogVol10" name="VolGroup10/LogVol10" vg_name="VolGroup10"/>
</service>
<service autostart="1" exclusive="0" name="nodisk1" recovery="restart" domain="domain1">
<ip ref="10.0.24.111"/>
<script file="/etc/init.d/nodisk1" name="nodisk1"/>
</service>
<service autostart="1" exclusive="0" name="disk2" recovery="restart" domain="domain2">
<ip ref="10.0.24.112"/>
<script file="/etc/init.d/disk2" name="disk2"/>
<fs device="/dev/VolGroup20/LogVol20" force_fsck="0" force_unmount="1" fstype="ext3" mountpoint="/mnt/lun2" name="lun2" self_fence="1"/>
<lvm lv_name="LogVol20" name="VolGroup20/LogVol20" vg_name="VolGroup20"/>
</service>
<service autostart="1" exclusive="0" name="nodisk2" recovery="restart" domain="domain2">
<ip ref="10.0.24.112"/>
<script file="/etc/init.d/nodisk2" name="nodisk2"/>
</service>
</rm>
</cluster>
I think you'll need another service in order to maintain this IP. The problem is that when the SAN service fails rgmanager issues an
ip addr del <ip>
on the node that is running the service. Since this IP is shared it's yanked out from the other service. So you'll need to add another service such as:The way you setup your failover domains is key, if you do it wrong you'll wind up with the IP sitting on one node and the services on the other. Unfortunately I don't have a cluster to test with currently, but I'm thinking that you want all three of the services (the two that need the IP and the IP itself) in a single restricted failover domain with a priority of at least 1.
Always keep in mind that if you're making changes to
/etc/cluster/cluster.conf
by hand to increment the version number and then useccs_tool update /etc/cluster/cluster.conf
to push the configuration out to the other nodes. Another thing to keep in mind is thatccs_tool
is being phased out, but in RHEL 5.4 it should still work. The other command to remember isrg_test
it will allow you to see exactly what the cluster is doing when you start/stop services. Set your debug levels up and always watch the log files. Good luck!Have you tried putting the two services that are dependent on the disk in their own resource group?
It sounds like the best course of action would be to drop the IP and the running service when the failure is detected, then move the IP and both services to another cluster member.
The only way to make this work was to give the services not needing a disk their own virtual IP addresses.
cluster.conf now looks like this: