I would like to set a policy such that my Failover Cluster will always come into service, even if only one (of the two nodes) is available.
Background: I have only two nodes in the cluster, plus a witness quorum in a share on the DC. For this question assume that the DC stays in-service. (Windows Server 2019).
If I shutdown node1, then node2 will be active. If I then shut down node2, then cluster will be stopped (obviously), however, if I then start only node1, the cluster will never recover. Not only will it not recover, without node2, but I don't see an easy way to make the cluster come into service with the cluster manager. The only way I can recover the cluster, in this scenario, would be to start node2, however, that does not seem (to me) to be real high-availability. IMO I should be able to set a policy or have a reasonably easy way to bring the cluster back on-line (perhaps after a waiting period), even if node2 never recovers.
Am I just thinking about this the wrong way or missing something obvious?
UPDATE: I do see an error:
Node 'SOM2' failed to form a cluster. This was because the
witness was not accessible. Please ensure that the witness
resource is online and available.
However, the witness was available at that time, which makes me suspect that this is a permission issue, that is, the witness share is available to the cluster but not the cluster service accounts on each node. Is that possible?
Is there some special permission setting on the witness share to ensure it can be accessed by the local service accounts on each node?
Update:
To fix the permission error (not the central problem), I needed to use a powershell command from:
https://docs.microsoft.com/en-us/powershell/module/failoverclusters/set-clusterquorum
Check the permissions on the witness to allow full control by the correct domain account, such as a service account where the password never expires and cannot be changed. Then, on a cluster host, first get rid of the current witness configuration:
Set-ClusterQuorum -NoWitness
Get-ClusterResource
if needed:
Remove-ClusterResource -Name "File Share Witness"
or remove it using Failover Cluster manager
then, re-add the file-share witness with necessary domain credentials to allow access:
Set-ClusterQuorum -NodeAndFileShareMajority \\server\path-to-witness -Credential $(Get-Credential)
As @stuka noted, this is by design. The file was locked by a live node before the whole cluster went down. There's no way for Node1 to know that Node2 is not actually online but inaccessible over the cluster network. It has to rely on the locked file as being correct. It would be far worse for Node1 to come online in that scenario as if the cluster network went down, neither node would be able to break the quorum voting tie.
If you actually encounter this scenario, you have to edit the quorum settings and force a node back online manually.
In practice this shouldn't be of concern because it would be rare for the cluster to ever go entirely offline.
Two node clusters will always have a compromise in terms of HA. The witness file share establishes quorum, but it cannot cover all scenarios. A 3-node (or other odd node) cluster would provide better fault tolerance.
If the quorum witness share is accessible to the online node, it should definitely be able to bring the cluster online. This is standard WSFC behavior. If your cluster is not starting and the witness share is online, something else must be preventing it from starting. Look for any errors.
Also, how are the cluster quorum settings configured?
See here for reference: https://docs.microsoft.com/en-us/windows-server/failover-clustering/manage-cluster-quorum.