I'm running vSphere 5 in an HA cluster across two hosts (vsphereA and vsphereB). I have the HA cluster configured for host monitoring and datastore heartbeat monitoring with admission control disabled (hopefully I rightfully understand that datastore heartbeat monitoring prevents inadvertent and unwanted HA failovers due to management network isolation). Each host has a single connection to a dedicated iSCSI network and iSCSI target (no MPIO). All vmdk's for all VM's exist on the iSCSI datastore. As a test of HA I disconnected the iSCSI connection on vsphereB and was surprised to see that the running VM's on vsphereB continued to run on vsphereB. The powered off VM's were showing as inaccessible (which I expected due to the fact that they weren't running and the connection from vsphereB to the iSCSI target was severed) but the running VM's continued to run and continued to be "owned" by vsphereB. I expected to see an HA failover occur for those VM's and expected to see them "owned" by vsphereA after the HA failover (which didn't occur). I'm at a loss to understand why an HA failover didn't occur for those VM's. Am I misunderstanding in which cases an HA failover should occur?
You seem to be confusing vMotion and HA, which are different features that do different things.
vMotion is a feature which allows virtual machines to be migrated from one physical host to another with no downtime and minimal (milliseconds) disruption in service. It is done in advance of maintenance and requires the VM and both the source and destination hosts to already be in a healthy state. HA is a feature which restarts failed virtual machines (or inaccessible virtual machines if host isolation is configured) and does result in downtime for the VM, since the entire virtual machine is powered off and restarted.
Important take-away: a vMotion is not an HA failover. An HA failover is an HA failover.
vMotions are triggered by the following things:
HA failovers are triggered by the following things:
Bottom line: vMotions occur because of performance events, and HA failovers happen because of availability events.
What you've done is pull the disk out from underneath a running VM. The standard behavior of vSphere, and most hypervisors, in this instance is to leave the virtual machine alone, and let it handle its own disk issues. There's several good reasons for this:
On the other hand, for many workloads (databases come to mind), it's a good idea to shut down as soon as there's a chance corruption or lost transactions might occur. In a best-case scenario, though, since you can't cleanly quiesce the database without the disk, you're probably ending up in an inconsistent state anyway.
Ultimately: there's some good use cases for having HA respond to unreliable storage, but it doesn't do that today, and the behavior you're seeing is totally normal.