To make this easier to wrap my head around, here's what I'm using in my examples:
deecee = my domain controller
dctoo = another domain controller
internal.foo.bar = the full DNSDomainName of my windows domain.
foo = the short (netbios) name of my windows domain.
oursite = The only site in our domain
We have all of the logging turned on for MS DNS Server and see plenty of NXDOMAINs for requests of this form: _ldap._tcp.deecee.internal.foo.bar.
Note that I am not talking about _ldap._tcp.internal.foo.bar.
Those are working fine. Here is an error entry from the log:
2/19/2015 8:07:06 AM 0960 PACKET 0000000002F885B0 UDP Snd 10.0.0.87 5052 R Q [8385 A DR NXDOMAIN] SRV (5)_ldap(4)_tcp(6)deecee(8)internal(3)foo(3)bar(0)
UDP response info at 0000000002F885B0
Socket = 332
Remote addr 10.0.0.87, port 54309
Time Query=178201, Queued=0, Expire=0
Buf length = 0x0fa0 (4000)
Msg length = 0x006d (109)
Message:
XID 0x5052
Flags 0x8583
QR 1 (RESPONSE)
OPCODE 0 (QUERY)
AA 1
TC 0
RD 1
RA 1
Z 0
CD 0
AD 0
RCODE 3 (NXDOMAIN)
QCOUNT 1
ACOUNT 0
NSCOUNT 1
ARCOUNT 0
QUESTION SECTION:
Offset = 0x000c, RR count = 0
Name "(5)_ldap(4)_tcp(6)deecee(8)internal(3)foo(3)bar(0)"
QTYPE SRV (33)
QCLASS 1
ANSWER SECTION:
empty
AUTHORITY SECTION:
Offset = 0x0030, RR count = 0
Name "(8)internal(3)foo(3)bar(0)"
TYPE SOA (6)
CLASS 1
TTL 3600
DLEN 38
DATA
PrimaryServer: (6)deecee[C030](8)internal(3)foo(3)bar(0)
Administrator: (5)admin[C030](8)internal(3)foo(3)bar(0)
SerialNo = 247565
Refresh = 900
Retry = 600
Expire = 86400
MinimumTTL = 3600
ADDITIONAL SECTION:
empty
Note that the client is requesting _ldap._tcp.deecee.internal.foo.bar.
According to Microsoft's documentation, the proper request should be _ldap._tcp.internal.foo.bar.
The requests come in from all of our AD joined machines. They include Windows 7, Server 2008, 2008 R2, 2012, and 2012 R2.
Our DNS servers do have the appropriate SRV entries for _ldap._tcp.internal.foo.bar
and they do resolve correctly. So that's not the issue.
A coworker opened a case with Microsoft and the tech finally claimed after a few days that this is normal. I don't buy it. Why is there no mention of this behavior at all in any documentation?
So, Does anyone else see this behavior? Clients looking up SRV records for _ldap._tcp.deecee.internal.foo.bar
? If so, are they getting NXDOMAIN results?
Any ideas how to fix this?
Thanks in advance.
Update A - There's more
In my domain I'm seeing these invalid queries in order of most common:
_ldap._tcp.oursite._sites.deecee.internal.foo.bar
_ldap._tcp.deecee.internal.foo.bar
_ldap._tcp.oursite._sites.dctoo.internal.foo.bar
_ldap._tcp.dctoo.internal.foo.bar
_ldap._tcp.deecee <- only from our sharepoint hosts
_ldap._tcp.oursite._sites.decee
_ldap._tcp.oursite._sites.dctoo
_ldap._tcp.dctoo <- only from our sharepoint hosts
Update B - There's something in sharepoint
I turned on netlogon debugging on one of the affected machines and found some interesting stuff. First, this is what I believe is a successful query being sent:
02/26 22:31:00 [MISC] [6824] DsGetDcName function called: client PID=1884, Dom:FOO Acct:(null) Flags: DS NETBIOS RET_NETBIOS
02/26 22:31:00 [MISC] [6824] NetpDcInitializeContext: DSGETDC_VALID_FLAGS is c07ffff1
02/26 22:31:00 [MISC] [6824] NetpDcGetName: internal.foo.bar. using cached information ( NlDcCacheEntry = 0x0000007051E732F0 )
02/26 22:31:00 [MISC] [6824] DsGetDcName: results as follows: DCName:\\DEECEE DCAddress:\\10.1.1.80 DCAddrType:0x1 DomainName:FOO DnsForestName:internal.hlc.com Flags:0x800031fc DcSiteName:oursite ClientSiteName:oursite
02/26 22:31:00 [MISC] [6824] DsGetDcName function returns 0 (client PID=1884): Dom:FOO Acct:(null) Flags: DS NETBIOS RET_NETBIOS
And here's what an unsuccessful query being sent looks like:
02/27 09:13:01 [MISC] [308] DsGetDcName function called: client PID=1884, Dom:DEECEE Acct:(null) Flags: WRITABLE LDAPONLY RET_DNS
02/27 09:13:01 [MISC] [308] DsIGetDcName: DNS suffix search list allowed but single label DNS disallowed for name DEECEE
02/27 09:13:01 [MISC] [308] NetpDcInitializeContext: DSGETDC_VALID_FLAGS is c07ffff1
02/27 09:13:01 [CRITICAL] [308] NetpDcGetNameIp: DEECEE: No data returned from DnsQuery.
02/27 09:13:01 [MISC] [308] NetpDcGetName: NetpDcGetNameIp for DEECEE returned 1355
02/27 09:13:01 [MAILSLOT] [308] Sent 'Sam Logon' message to DEECEE[1C] on all transports.
02/27 09:13:03 [CRITICAL] [308] NetpDcGetNameNetbios: DEECEE: Cannot NlBrowserSendDatagram. (ALT) 53
02/27 09:13:03 [MISC] [308] NetpDcGetName: NetpDcGetNameNetbios for DEECEE returned 1355
02/27 09:13:03 [CRITICAL] [308] NetpDcGetName: DEECEE: IP and Netbios are both done.
02/27 09:13:03 [MISC] [308] DsGetDcName function returns 1355 (client PID=1884): Dom:DEECEE Acct:(null) Flags: WRITABLE LDAPONLY RET_DNS
If my understanding is correct (please correct me if not), the first line of this indicates that the process with PID 1884 is asking netlogon to log in to a domain named "DEECEE". It literally thinks the domain name is DEECEE. Of course, the previous snippet (and others) show that this process, pid=1884, is shotgunning out requests, some of which are legit, and some aren't.
Checking the process list on that machine tells me it's a w3wp
process. So I found out the application pool:
C:\Windows\System32\inetsrv>appcmd list wps
WP "1856" (applicationPool:SharePoint - 80)
WP "6540" (applicationPool:SharePoint Central Administration v4)
WP "1884" (applicationPool:272b926088ea454c8a4b4caa8526d3bb)
WP "8468" (applicationPool:6997d03e3ea94018841409e8b821d8da)
WP "6696" (applicationPool:SecurityTokenServiceApplicationPool)
And then I checked which applications are running in that pool:
PS C:\Users\administrator.HLC> Get-SPServiceApplication | foreach { if($_.ApplicationPool.Id -eq "272b9260-88ea-454c-8a4b-4caa8526d3bb") { $_ } }
DisplayName TypeName Id
----------- -------- --
PerformancePoint ... PerformancePoint ... 8681c71c-81b9-41e5-ac19-58d0ccf11227
Managed Metadata ... Managed Metadata ... ef99af38-a3f8-4864-8c88-9ee421f3dfa0
App Management Se... App Management Se... 183ca7a4-825a-4807-91fc-4fe1c9fe93e0
Excel Services Excel Services Ap... 46557c93-3d60-47f0-99ab-45cc32258137
Subscription Sett... Microsoft SharePo... 9fd75bbe-1464-4a4c-8bd0-3382c0c03dce
Search Administra... Search Administra... ee519543-e311-41fd-a8a4-0b952f731ff8
User Profile Service User Profile Serv... fe6886ab-4a2d-4216-8bcf-5160dad5c037
Business Data Con... Business Data Con... 813bb77c-9eb4-43d0-b2cc-09e8162e58e7
Work Management S... Work Management S... 81dbd284-2506-43a0-be93-2820759bb804
Search Service Ap... Search Service Ap... d641f112-b299-4318-baaf-817ef96107c4
So I spent some time enabling and disabling these sharepoint services and watching the DNS queries go out. It appears that the User Profile Service is causing the queries for at least _ldap._tcp.deecee.
I know the whole thing isn't sharepoint's fault; as I said earlier these queries are coming from all over the place. The ones for just _ldap._tcp.deecee, though, are coming only from our sharepoint hosts.
So that adds another question. What is the user profile service doing that's causing the lookups to _ldap._tcp.deecee? It still leaves the question for the rest of our servers, though.
This is a bug.
Microsoft has known about it for a long time (since Win2000) but no one has convinced them to fix it.
With netlogon debugging enabled I found the same result in my Win7 SP1 machines (domain controllers are 2008r2SP1). It also caused an 8 second delay in processing so far as I can tell. Looks like a faulty API call from netlogon to me.
You can replicate the same 1355 error by running the following on a workstation:
returns:
clearly because it's calling the dsgetdc with the wrong parameter.
Though I agree with everyone else, it's most likely nothing wrong with your infrastructure. It would be nice to get to the bottom of it though.
No need to fix, these lookups are being done to find the corresponding LDAP server for your AD tree.