I am tasked with recovering a VMWare 6.5 cluster that, after an unexpected power failure, has a VM (the most important one...) stuck at boot.
From the vmware.log
file, it seems the problem is related to a corrupted CTK file and, as I read on this vmware KB, it should be sufficient to remove the affected CTK file (ok, not really so simple, but simple enough...)
However the affected VM has some snapshots active and, as I read on another (older) KB, such a procedure should not be attempted if snapshots are present.
What is the right path/procedure to unstuck the VM and letting the boot process to complete?
In this case, the solution was the simplest, yet strangest, possible: to wait for the night. After some hours, both VM "unstuck" and correctly booted.
Regarding the change-tracking-file (CTK) question, I simulated the problem with a spare VMWare hypervisor and, after reading VMWare own documentation (quite light on details...) I think the key point it that you can delete the CTK files even if the virtual machines has active snapshots, but such changes can corrupt any subsequent CTK-aware backups. So, in such cases, you also need to disable CTK on VM and disk level, consolidate any snapshots, do a full backup, re-enable CTK (again, both on VM and disk level) and re-enable incremental backups.
Disabling CTK seems to have effect on the last CTK file only (note: a CTK file exists for each VMDK flat and delta files, so each snapshot commands a new CTK file) and this seems to be the reason VMWare recommend to have no snapshots when enabling/disabling block change tracking. From here: