Background / Goal
- I have a VMWare HA cluster for production Machines with two hosts.
- It is currently set up so that it can account for the failure of up to one host. It does not use DRS.
- I need to remediate both of these servers to apply patches. I would like to do this with zero downtime.
Questions
- Can I vMotion the VMs in the cluster specifically to another host in the cluster and then take down a server?
- What is the best / recommended way to remediate servers in a HA configuration to avoid downtime?
If you're not using DRS then you'll have to manually evacuate your powered on VM's to another host in the cluster before VUM will remediate the host. It's also recommended that if you're using HA Admission Control, Distributed Power Management or Fault Tolerance that you disable those features before you remediate the host.
In short, migrate (vMotion) your powered on VM's to another host in the cluster, remediate the host, then migrate the VM's back.
Disable the right options in your host/cluster remediation options screens:
I typically disable admission control, fault tolerance, and DPM (but who uses that?)
I may manually vMotion a few VM's if the process doesn't seem to kick-off.
Be patient. It takes up to 10-15 minutes per host, depending on your connectivity.
When you remediate a host in a cluster, the host goes in to maintenance mode which then vmotions the VMs to another host and starts the update process. The host will come out of maintenance mode once the update process has finished. You can then do a rolling host upgrade so to speak. You do not have to vmotion the VMs off, although you can do this, but I don't believe this step is necessary. So in your case, as long as you have capacity on the other host, you can remediate the first host, wait for the updfate process to complete (view the tasks and events for detailed information) and then do the other host