I have a domain with 3 AD servers for now i'll just call them:
- AD01 (Win 2008 GC, Operations master)
- AD02 (Win 2008 GC)
- AD03 (Win 2003 GC)
A couple of months there was some hardware issues with AD01 so the operations master, PDC and Infrastructure Master was moved to AD02. All machines where on while this was happening.
- AD01 (Win 2008 GC)
- AD02 (Win 2008 GC, Operations master)
- AD03 (Win 2003 GC)
AD01 was then shutdown for a month. Upon starting this machine up with replaced hardware (NIC and RAID card) i now have a weird problem.
- AD01 Thinks it is operations master still in AD on the local box
- AD02 & AD03 Thinks AD02 is operations master in AD on both boxes
- When running DCDIAG on AD01 i get a number of issues (listed below)
When running "dcdiag /test:advertising" on AD01:
Doing primary tests
Testing server: Default-First-Site-Name\AD01
Starting test: Advertising
Warning: DsGetDcName returned information for \\ad02.domain.local, when
we were trying to reach AD01.
SERVER IS NOT RESPONDING or IS NOT CONSIDERED SUITABLE.
......................... AD01 failed test Advertising
Running partition tests on : ForestDnsZones
Running partition tests on : DomainDnsZones
Running partition tests on : Schema
Running partition tests on : Configuration
Running partition tests on : domain
Running enterprise tests on : domain.local
When running "dcdiag" on AD01 i get the following errors (excerpt of the Final output):
Testing server: Default-First-Site-Name\AD01
Starting test: Advertising
Warning: DsGetDcName returned information for \\ad02.domain.local, when
we were trying to reach AD01.
SERVER IS NOT RESPONDING or IS NOT CONSIDERED SUITABLE.
......................... AD01 failed test Advertising
Starting test: FrsEvent
There are warning or error events within the last 24 hours after the
SYSVOL has been shared. Failing SYSVOL replication problems may cause
Group Policy problems.
Starting test: NCSecDesc
Error NT AUTHORITY\ENTERPRISE DOMAIN CONTROLLERS doesn't have
Replicating Directory Changes In Filtered Set
access rights for the naming context:
DC=ForestDnsZones,DC=domain,DC=local
Error NT AUTHORITY\ENTERPRISE DOMAIN CONTROLLERS doesn't have
Replicating Directory Changes In Filtered Set
access rights for the naming context:
DC=DomainDnsZones,DC=domain,DC=local
Starting test: Replications
[Replications Check,Replications Check] Inbound replication is
disabled.
To correct, run "repadmin /options AD01 -DISABLE_INBOUND_REPL"
[Replications Check,AD01] Outbound replication is disabled.
To correct, run "repadmin /options AD01 -DISABLE_OUTBOUND_REPL"
So the problem appeasr to be that when i moved the operations master, AD01 never got the memo, and now that it's started up, all the other AD servers don't think its the boss anymore when it trys to replicate etc. So i really need to manually update AD01 so that it knows who the operations master, instrastructure and PDC is - but i'm not having any luck
I've been googling for nearly a day and all solutions lead to "the cake is a lie"
Your ninja skills will be greatly appreciated
Is there any reason you can't just do dcpromo on AD01, demote it from a domain controller, reboot, then bring it backup to a domain controller using dcpromo again?
I seem to have fixed the issue. Note the comment in the error:
I did this for both the options mentioned in the log - Then i noticed that for some weird reason the netlogon service was paused... say waaa?
I then started netlogon, then ran a forced synch. This time the synch worked and everything came back to life.
The next thing i would have tried would have been to do as Josh suggested and dcpromo down the box.
jason's comments about DNS were also very helpful, as this is one of the first things i thought to - so if someone else comes along i'd check that first.
Thank your very much for the quick replies though. I've been a long stackoverflow supporter and its great to see this is just a great :-)
I suspect that instead of moving the operations master roles from 01 they were seized by 02. In this case the behavior you are describing is correct. 01 has no idea that it is no longer the master it once was.
Another possbility is that the roles were moved but that 01 was shut down before all of the DNS entries that got changed somehow did not get replicated in an AD integrated zone back to 01.
In either case I would remove dc1 fro the domain and re-add it using Dcpromo as replication has somehow been disabled
I agree with Josh. If you can dcpromo it down and back up, that would probably be best. Otherwise, here's an option that's hitting me:
The first error on dcdiag is strange. It makes me think there is an erroneous DNS entry somewhere. That would definately cause replication issues. Point DC01 to one of the other DCs for DNS, restart netlogon on DC01 (or better yet, maybe reboot the server), then see if you can force a replication through AD Sites and Services. Once AD starts replicating, ensure DNS is set to be Active Directory integrated and point DC01 back to itself for DNS (assuming it is a DNS server). Once AD begins replicating again, you should see the correct FSMO roles on all servers.
I spoke to soon. It appears that i was experiencing and "USN rollback issue" http://support.microsoft.com/kb/875495/en-us
This was a huge head ache. and i appear to have hurt AD in the process. By restarting NETLOGON and enabling synch's again i have let bad data back into AD on the other boxes.
Last week we did a major mailbox move to a new mailstore and this appears to have now stuff mail on all our mailboxes. :(
One thing to learn from this:
If NetLogon is ever "paused" there is probably a good reason.