We are conducting bi-yearly resiliency test on a 4 node VCS cluster. With two applications running on nodes 1-3 and 2-4 resp. in active-standby mode.
When making a manual switch over, or a graceful shutdown on one node, application will nicely switch over to the other node.
When we turn off or reset a node however, it seems the absence of a handoff by the affected node triggers a reboot of the other cluster nodes. What's more, in case of a turn off, the remaining nodes do restart, but fail to join the cluster. When the killed server is resumed, all join again.
This is totally defeating the purpose of a cluster, obviously. Our vendor who provided the applications and the cluster software (with the hardware) proposes that such a case is not realistic and that servers always hand off nicely when going down.
We have no specialization in proprietary cluster technologies, so while we assume their statement is incorrect, we don't know what is possibly going wrong. I suspect however any commercially successful cluster software can handle these situations, but our implementation suffers configuration errors.
Any clue would be appreciated.