I've implemented VSAN in my home lab and I'm trying to understand why I'm receiving a "Disk Space Utilization" failed alert.
The Cluster consists of a pair of servers and a Witness appliance. Each of the two servers has a 500 GB SSD and a 6 TB SATA drive. The SATA drives show a capacity of 5.46 TB, and the total raw capacity of the VSAN Datastore is reported as 10.81 TB. Everything was healthy when I set up the VSAN (well except for Hardware compatibility checks, but as I said this is a home lab).
After adding a fair amount of data to a thin disk provisioned VM I received the Disk Space Utilization alert. The Summary tab on the Datastore reports 7.29 TB of 10.81 TB used, which I take to mean that the actual raw storage taken by my VMs (which are all thin disks) is 7.29 TB. I'm using the default Storage Policy, so I think this means that 7.29 TB is twice what the VMs would be consuming without VSAN (i.e. RAID 1), so I should be consuming 3.64 TB on each host. However the alert says I am at 134% utilization (7465GB of 5533GB). What's going on here?
Here are some screenshots of my setup and the alert:
Note the Cluster warning in that last screen shot is complaining about Disk Balance, which I am also troubleshooting but I believe is unrelated to this issue.
I'm not familiar with this product, but it says "number of disk failures to tolerate" is 1. The only way to do that in a two disk system is to keep two copies. Therefore, whatever you store will take twice as much space.
Ok, after stumbling upon this I think I know what's going on (sorry for the Google webcache link but the VMware forums are down right now for maintenance).
With the Storage Policy I've told vSAN to tolerate one failure, which of course means keep two copies of the data (with the default failure tolerance method that is). To vSAN, "tolerate" means still maintain two copies of the data even if a host fails (so really RAID 1 + Spare). Which I guess is nice if you have several VSAN hosts, but with only two hosts it appears that it tries to make sure there is enough capacity to put two copies of the data on a single host. Which seems odd, and requires that you stay below 50% of your usable capacity (below 25% of your raw capacity) or the warning will trigger.
I'm willing to accept that there is only one copy of my data if one of my two hosts goes down, so my solution was to disable the vSAN Health Checks. Which is not fantastic, but I won't abide a red X on my Cluster all the time. That's no way to live.
Note the docs do say:
I didn't think that was applicable to a two-node vSAN cluster, but it is, with the +1 being the Witness Appliance.