This question kind of got me thinking about fault tolerance in DHCP, so I did a little digging in my current environment and discovered that we only have 1 DHCP server per major site in our company with no redundancy. All of our DHCP servers are virtual with VMWare high availability and regular backups using Quantum VMPro, so in the event of almost any catastrophic crash of our DHCP servers we can still recover inside of an hour.
This would lead me to think that a redundant DHCP server for failover is, well, redundant. But most of my prior experience is in the small business sector where this kind of situation just never comes up. Big business is very different.
Most of our file servers are in the same configuration, except for the few remaining physical server clusters that haven't gotten caught in our virtualization efforts yet.
So in a virtual environment, what are the decision points for adding server redundancy? Examples: When would I add a virtual DHCP standby server? Or create a virtual failover cluster for file servers? I understand that this is probably difficult to answer without enumerating the specific needs of an organization, but I think it's possible to describe a few example situations that would help an SA to be prepared before the need arises.
I'm strictly concerned about fault tolerance and failover - load balancing in this context is totally unrelated.
As always in life - and especially in IT, the answer is "it depends".
On that very specific use case you have, with a virtualized environment, VMware HA - it does not really need a standby then - , but still DHCP as a very "light" service, my suggestion is to just spin up DHCP on another VM (or even another existing VM), and have them in a DHCP Failover configuration if you have 2012+, or have them in a "Split Scope" configuration.
Refer to Understand and Deploy DHCP Failover on TechNet
For the other examples (e.g. FileServer Cluster etc), you need to evaluate some of the following:
The question you have to ask here is: What is the time limit for getting DHCP up and running again?
If it will take too long in the current setup, you should set up a failover cluster.
But: Do you really mistrust vmware ha?
What scenario do you want to cover?
Deciding if/when to make a failover or server redundancy is a matter dependent on several factors: resources, type of target service, type of backup you have, your uptime/downtime/recovery time target.
The most simple scenario of all is if you have a backup that supports instant recovery. This can cover most general use situations. You're up in 2 minutes and have the data available updated since the last backup.
For the some specific, there are various ways. For exammple:
For an ERP system that has databases and has large amounts of data transferred to in continuously it's good to have a failover cluster (Make sure you failover the storage too).
For Domain controllers and Authoritative DNS servers you can always use primary servers with secondary and even tertiary ones. Windows DHPC fits here.
For things like e-mail servers it depends on other details, but stand-by replica is usually a good idea.