I have a pair of ESXi (5.1) hosts in a High Availability cluster. Each host has 48GB of RAM. I currently have 18 VMs running, with the following configured memory amounts:
- 1x 4GB, 1 vCPU
- 1x 4GB, 4 vCPU
- 5x 2GB, 1 vCPU
- 5x 1GB, 1 vCPU
- 4x 512MB, 1 vCPU
- 2x 512MB, 1 vCPU, with Fault Tolerance (so consumed memory is doubled)
That should add up to 27GB of RAM. Based on the Resource Management Guide (and assuming 38 MB overhead for a 2 GB, 1 vCPU VM and 23 MB for a 512 MB, 1 vCPU VM), there should be about 612 MB of overhead, for around 28 GB total.
When I look at the individual hosts, the memory in use appears to line up with that. On the Summary tab, for "Memory usage", one host shows 14240.00 MB and the other shows 14897.00 MB, for a total of about 28.5 GB.
The Resource Allocation tab for my cluster in the vSphere Client, however, shows the following:
- Total Capacity: 89705 MB
- Reserved Capacity: 47210 MB
- Available Capacity: 42495 MB
Setting aside the fact that 48 GB of RAM across two hosts is 98304 MB, not 89705, why is the reserved capacity so high? There's almost 20 GB between what the individual hosts claim to be using (28 GB) and what the cluster claims to be using (46 GB). Moreover, this is preventing me from adding new VMs, since the HA cluster needs to be able to tolerate the failure of one host and the software thinks I'm running at full capacity for that constraint.
All of my VMs are configured with no RAM limit and no RAM reservation, except for the two Fault Tolerant VMs, which have all of their RAM reserved.
This is with a vSphere 5.1 Standard license.
After some (long) conversations with VMware support, I have come to the following understanding:
The number in "Reserved Capacity" is not a function of the memory configuration for the cluster's VMs. It is the sum of several factors: any memory reservations declared on VMs, a value calculated from the HA admission policy, and an additional amount for memory management overhead. The HA admission control value is directly derived from the admission control policy; in my case, since I had it set to tolerate a single host's failure, the total amount of RAM on one of my hosts was added to the cluster's reserved capacity.
Among other constraints, it appears that HA admission control will not allow the reserved capacity to exceed the RAM in a single host. (Either that or it won't allow the available capacity to drop below the RAM on a single host; I'm still not clear on which of these is really the case, since they're the same thing in my two-host cluster.) This has the net result that practically any amount of memory reservation is incompatible with what would otherwise seem to be natural settings for HA admission policy in a two-host cluster. Since Fault Tolerance forces memory reservations, that makes it similarly incompatible. I was told that if there were more hosts in the cluster, the reserved capacity would be "spread out" across more of them and some degree of memory reservation would be possible.
The net result for me is that I had to change my HA admission policy to reserve a percentage of the available resources (instead of "one host's worth") and calculate that percentage to exclude any memory reservations necessitated by the use of Fault Tolerance.
Looks to me like this is a function of you having set the HA cluster admissions control policy to "Percentage of cluster resources reserved" and given it a 50% reservation. So that's working as intended.