Several hours ago, a handful of our member servers became unable to authenticate against the two domain controllers they should be using. The member servers and DC are located in the same datacenter, and are on a separate "site" in AD. Running DCDiag shows no problems, and we've confirmed that the servers and DCs have network connectivity with each other. Running nslookup on the member servers shows the proper DC listed as the name server in each case.
LDAP authentication seems to be working, however, Kerberos authentication has stopped working. Basically, all of the key internal services have stopped.
Here are specifics on some of the problems we are having with member servers:
Exchange - Topology Service cannot find any domain controllers. Therefore, the Exchange Information Store cannot start.
SharePoint - Authentication is failing at the IIS level and between IIS and SQL (this farm has been up for mutliple years).
Additional troubleshooting:
NLTEST /DCLIST:domainname - No DC can be found to get a DC List
NLTEST /Server:Servername - Both DCs Complete Successfully.
NLTEST /DSGetDC:Domain - Commands complete sucessfully.
NLTEST /dsgetsite - Completes successfully.
GPUpdate - User cannot be found. No domain exists
Output of nslookup -type=SRV _kerberos._tcp.dc._msdcs.subdomain.mydomain.com
on the exchange server:
Server: colo-dc-001.subdomain.mydomain.com
Address: 10.11.2.20
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
priority = 0
weight = 100
port = 88
svr hostname = branchf-dc-001.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
priority = 0
weight = 100
port = 88
svr hostname = colo-dc-001.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
priority = 0
weight = 100
port = 88
svr hostname = hq-dc-003.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
priority = 0
weight = 100
port = 88
svr hostname = colo-dc-002.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
priority = 0
weight = 100
port = 88
svr hostname = hq-dc-004.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
priority = 0
weight = 100
port = 88
svr hostname = branchc-dc-002.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
priority = 0
weight = 100
port = 88
svr hostname = branchm-dc-001.subdomain.mydomain.com
_kerberos._tcp.dc._msdcs.subdomain.mydomain.com SRV service location:
priority = 0
weight = 100
port = 88
svr hostname = branchs-dc-001.subdomain.mydomain.com
branchf-dc-001.subdomain.mydomain.com internet address = 10.10.2.22
colo-dc-001.subdomain.mydomain.com internet address = 10.11.2.20
hq-dc-003.subdomain.mydomain.com internet address = 10.1.2.20
colo-dc-002.subdomain.mydomain.com internet address = 10.11.2.21
hq-dc-004.subdomain.mydomain.com internet address = 10.1.2.21
branchc-dc-002.subdomain.mydomain.com internet address = 10.5.2.21
branchm-dc-001.subdomain.mydomain.com internet address = 10.6.2.21
branchs-dc-001.subdomain.mydomain.com internet address = 10.7.2.22
We can RDP to any of the servers that are hosting the above services, but the services will not work.
System logs on the member servers include some error messages about not being able to find a DC.
So basically, the network seems to be up, and the DCs seem to be up, but member servers right there on the same network segment can't find them. Where should we look for the problem?