we have set 2 node cluster with san box our config like that HS22 IBM blade center with T3400 SAN box with SAN Switch i have try with RHEL 5.2 RHEL 5.3 RHEL 5.4 cluster suite i can reboot using luci as well i can fence both server even i can relocate the services from 1st node to 2node
Issues is if ckcek on node 1 clustat and it show me all the service and cluster owner is node 1 if i stop services network at node 1 it will relocate all the service to node2 and node 1 goes poweroff. when i reboot the node 1 it will join the cluster that time node 2 is owner of all the services as well cluster and if i stop service notwork at node2 it dont relocate cluster to node 1 and on my /var/log i can see 52 failed to changed RG status have any one come across like this issues if yes then what is work around
Thank you so much people I got this working!!!
I don't have any direct experience with RH clustering but, from your description, it sounds like node 1 isn't re-joining the cluster correctly after you reboot it.
As a starting point, I'd check that all the appropriate services are set to start automatically on node 1, but before I do that, I'd clean up your question, as it's almost unreadable in its current form.
There appears to be a bug (sort of) related to this over at RedHat's Bugzilla, too.
I bet I'll receive some vote downs for this, but my experience with RHCS is that it basically doesn't work at all. I tried and tried and tried to make a simple 3 node cluster work with ricci and luci and ended up just giving up. My searches indicated similar experiences and a common theme that RHCS is not ready for deployments in production. I was able to sometimes join a couple servers to the cluster, but as soon as I tried to join another node, it just failed with very little information in the logs.
I ended up moving towards Pacemaker backed with a DRBD filesystem and found it is more flexible and just works. My advice is to use Pacemaker.
if a network service goes down, the cluster node goes into "unknown" state. The CS has no idea whether the host actually died, or became temporarily unresponsive. If you have a fence mechanism in there, you can fence the host, which will also inform the RHCS that the node is actually down, so the services can be taken to another node. If the services would simply restart elsewhere, and the host got it's network back, you would have the same service running on both nodes, accessing the same files on the SAN thus corrupting them.