Short Version
Domain controller was setup, then taken offline for longer than the tombstone limit. Now I can't get it to replicate again.
Relevant Error Messages
On dc2 (identical error messages exist about both exchange and dc1):
The kerberos client received a KRB_AP_ERR_MODIFIED error from the server host/exchange.mydomain.local. The target name used was [email protected]. This indicates that the password used to encrypt the kerberos service ticket is different than that on the target server. Commonly, this is due to identically named machine accounts in the target realm (MYDOMAIN.LOCAL), and the client realm. Please contact your system administrator.
Another relevant error (Event ID 2042):
The Knowledge Consistency Checker (KCC) has detected that successive attempts to replicate with the following domain controller has consistently failed. Attempts: 12 Domain controller: CN=NTDS Settings,CN=DC1,CN=Servers,CN=MainSite,CN=Sites,CN=Configuration,DC=mydomain,DC=local Period of time (minutes): 105103 The Connection object for this domain controller will be ignored, and a new temporary connection will be established to ensure that replication continues. Once replication with this domain controller resumes, the temporary connection will be removed. Additional Data Error value: 2148074274 The target principal name is incorrect.
And Event ID 1925:
The attempt to establish a replication link for the following writable directory partition failed.
Other Details
Both sites are connected through a VPN. At the main site, I have two domain controllers (which we shall call exchange and dc1). Both are Server 2003. If it matters, dc1 holds all the FSMO roles.
In preparation for setting up a remote site, I setup a domain controller called dc2, running Server 2003 R2, and configured separate sites in AD Sites and Services, and configured replication from dc1 to dc2. I even had it the correct subnet for the remote site by connecting it through a router (this was before the site was connected to the VPN, so no IP conflicts).
Everything was working great, so I shut down and got it ready to take out. But things kept getting delayed for over 2 months, and now dc2 won't replicate properly.
What I've tried
Removing the domain controller role - fails with:
Managing the network session with DC1.mydomain.com failed "Logon Failure: The target account name is incorrect."
Resetting the machine password with:
Disable and stop KDC service
klist /purge
netdom resetpwd /s:dc1 /ud:domainadmin /pd:domainadminpassword
Reboot
Reenable KDC service
Most of the KB articles I went through about fixing replication after reaching the tombstone life got stuck because of the "The target principal name is incorrect" error.
It seems the easiest way is indeed to remove active directory and reinstall it, and it can be done without wiping out the entire server. This leaves anything else on the server untouched. However, since you can't remove active directory properly, you have to force it to be removed from the server then cleanup manually on a good domain controller.
Disconnect the problem server from the network to prevent any of this from potentially breaking active directory on the good servers.
On the problem server, run
dcpromo /forceremoval
. This allows you to remove active directory on the system without removing all it's records on the other domain controllers.Use ntdsutil from a good domain controller to remove the problem server from active directory. Instructions are in the help link when you run dcpromo /forceremoval, or here: http://technet.microsoft.com/en-us/library/cc736378%28WS.10%29.aspx
Delete the server object in AD Sites and Services
Delete the server in AD Users and Computers if it still exists
Delete the server from DNS:
Repromote the problem server and configure site settings like you would a brand new DC.
At this point it's probably easier to create a new DC and clean dc2 out of AD with ntdsutil.
WAY late to the game, but, maybe it helps someone else?
I had two VMs for a domain that had both been offline for a few years. I needed to get the fired back up to get some data from Exchange servers on them, and for just learning.
I did the following on the secondary DC:
klist purge
Then followed these steps on the secondary DC also:
https://quarksoft.com/2011/05/01/active-directory-domain-controllers-out-of-sync/
Then on BOTH DCs I added this registry key:
HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Allow Replication With Divergent and Corrupt Partner
as a DWORD set to 1 per :
https://www.techieshelp.com/it-has-been-too-long-since-this-machine-replicated/
Then I did a forced replication from command prompt per:
https://docs.microsoft.com/en-us/troubleshoot/windows-server/identity/replication-error-2146893022
Then I checked and was able to Replicate Now in NTDS both ways with no errors.
There are still some accounts that are too old to work for some apps, but now that I can connect to AD and both servers those can be worked out.