I am helping to troubleshoot an existing Windows failover cluster on Windows Server 2019 Data Center. The cluster is set up to use Storage Spaces Direct (http://aka.ms/s2d) but there are problems with the deployment, specifically the network driver versions are mismatched on the NIC ports used by S2D. To help upright the cluster, I will take the cluster down and go to a same-driver-version for all network ports on all nodes. Currently the plan is to run the NIC driver install after the cluster is offline, then simultaneously reboot all nodes and online the cluster.
Here is the rub: there apparently is NO official documentation on how to do this safely, other than doing an in-place update/upgrade of the driver software. Removal of the device entries followed by re-installation of the new driver software appears to be "a bit heavy-handed" as it will also entail a complete reset of the networking stack, via (somewhat obscure) powershell commands.
The Question (in three pieces):
Is there a known-best-practice for updating NIC drivers on a failover cluster running S2D (not to be confused with HDD or SDD or NVMe drivers) that would guarantee a clean driver update without disturbing the cluster?
Is my "in place" method sufficient, and will work as intended?
Or is this cluster simply done and over and probably needs a destroy-and-rebuild?
In the unlikely scenario where NICs with different firmware versions don't work well together what you are doing would make sense. But normally you should explore CAU cluster-aware updates as a possible option.DELL/HP and Lenovo have integrated CAU in their workflows.