I'm trying to get a better understanding about how Active Directory handles Schema updates, specifically how safe the procedure actually is given how critical AD is and given the range of situations where updates are required. Exchange 2007, OCS, SCOM all require schema changes for example, it's not just something that happens when you are considering a major shift from (say) a Windows 2003 to a Windows 2008 infrastructure.
What I'm looking for is advice on the best backout plan for schema changes, just in case it actually does go wrong. Would it be acceptable to take one DC offline during the update, for example, and use that to roll back the entire environment if the schema update failed? Are there any problems with reactivating a DC that was offline during a schema update?
Schema updates are a one way function. You can only add new schema to AD, you can never delete anything. For this reason you should always carefully evaluate alternatives when software requires schema extensions or updates; be sure it's something you're willing to commit to using.
First thing, make sure you have a good backup copy of the AD database (usually %SystemRoot%\ntds\NTDS.DIT)! Keep it in a safe place.
If you only have one DC in your forest, it's very straight forward. Just run the adprep as the instructions say (or let the software update AD itself).
If you have more than one DC then make sure there are absolutely no errors reported by
dcdiag
andreplmon -syncall
. Make sure you have backups of every AD Database (from each DC). Determine the DC with the Schema Master role. Do all updates on/to that server where possible.AD will protect itself in most cases from failed schema updates. If the LDIF file doesn't pass syntax (say you BSOD in the middle of an update), then it will not be loaded. Each "update" has it's own set of LDIF files.
I've never seen a schema update (so long as it's done properly) go wrong. MS really seem to have pulled out all the stops in making this a solid and reliable process, and it shows. The only real scenarios in which I could see anything bad happening would be if you lost power partway through (even then I'm not certain), or if your AD was already screwed to begin with (in which case you have bigger problems).
All that a schema upgrade really does is extend the AD with new object classes and properties (that an application or newer version of AD can make use of), so scope for disaster is quite limited. This technet article gives a decent overview and covers some potential Bad Things Happening cases.
Standard approach for me would be to ensure that everything is functioning properly beforehand (via dcdiag, replmon, etc), and ensure that I have a known-good backup of AD in case the worst happens. I'd keep this backup for as long as possible, as AD can be so damn robust that problems may not manifest for a long time afterwards. So standard backup and restore would be my rollback. But like I said, I've never seen that be the case.
The one dc offline approach would work for a small environment. For a large environment, I would prefer to perform the update on a dc that is not connected. Providing the update process completes successfully, then connect it to the network and replicate the changes. A backout in this scenario would be as simple as pulling one drive of a mirror set, and shutting down the dc and re-inserting the good drive that was current from before the update.
On a large network with hundreds or thousands of dc's, the re-insert the good dc approach would not be practical.