I asked this question a while back and it got bumped to chat because a lot of subjective opinions.
Original message here for reference: https://chat.stackexchange.com/rooms/139176/discussion-on-question-by-sabre-dns-forwarding-issue
And I found a seemingly similar issue, unanswered as well. Conditional Forwarding intermittent failures
So I figured I would try to consolidate it to basic information and try again. With log files for demonstration.
The core question is, without reporting errors, why would a DNS forwarder selectively fail for one host at random and then resume normal operation later? The details of how are as follows...
Edit: I can add to this, the issue happened again this AM (Day after post). The logs show when the incident occurred, one query happened correctly, then less than a second later, asked its WAN forwarder vs its cache or LAN. That cached the external IP, and failed everyone from that point forward until we deleted cache. First query after that followed the forwarder and cached the correct IP. Further making this mysterious, if cached, it should have not asked for a new IP anyway?
I have two DNS servers, two domains, both on LAN, both DC and DNS for their respective AD domains.
One domain is .local so cannot be queried from public DNS, the other is .ORG. The .ORG is split between hosts both on LAN and hosts on internet. We are only concerned in this scenario with intercepting hosts on LAN and let public DNS deal with the rest. So LAN hosts are handled by local server, anything else (Not a LAN host) goes out the next forwarder which is openDNS (And ultimately our SOA is Godaddy). I have learned this is what is referred to as SplitBrains DNS, and apparently a normal thing for hybrid DNS scenario just like mine.
So if you ask the DNS server for A.local where is one of the hosts on B.org, it should and almost always does ask B.org where that is (And never leave LAN unless it does not find a matching hostname there.)
I included a picture of the host that is failing to forward so we do not go down the "there is no such things as DNS forwarding" path again.
What is happening, is that randomly a host on the .org domain does not resolve, because .local DNS server does not ask the DNS server at the .org domain, meaning it never tries the conditional forwarder, I have now confirmed this with a simultaneous packet capture on both hosts, the path goes A.local=>openDNS not A.local=Forward=>B.org.
When it fails the .local does not even try to send to the .org, and the .org confirmed never receives any request.
If you query the .org directly not through the forwarder (NSLOOKUP), it works fine, host is there, and I can see its DNS record. As well the forwarder works fine during this time for other hosts on the .org domain. And the particular host that has these failures is not consistently the same.
This happens off and on, very infrequent, and random, with no change in configuration, and resumes normal operation later, again with no change in configuration.
Log files attached (From DNS logging on .local DNS server where failure is occurring), of the correct chain and the incorrect. The IP 10.1.1.250 is the DNS server for the .org, 10.1.0.16 is the IP of the client requesting the host resolution.
LogCorrect
- Request from client
- Request to .org DNS server (Forward/10.1.1.250)
- Response from .org DNS server to .local DNS server
- Response to client with information obtained via forward.
LogFailed
- Request from client
- Response from .local DNS server, directing it to external DNS (Forwarder never asked)
Hopefully those details will keep it in the question realm, not a chat :-)
Thank you.
LogCorrect:
10/12/2022 9:48:12 AM 0758 PACKET 000000883A640200 UDP Rcv 10.1.0.16 001e Q [0001 D NOERROR] A (3)myhost(4)mydomain(3)org(0)
UDP question info at 000000883A640200
Socket = 592
Remote addr 10.1.0.16, port 57756
Time Query=2147027, Queued=0, Expire=0
Buf length = 0x0fa0 (4000)
Msg length = 0x001e (30)
Message:
XID 0x001e
Flags 0x0100
QR 0 (QUESTION)
OPCODE 0 (QUERY)
AA 0
TC 0
RD 1
RA 0
Z 0
CD 0
AD 0
RCODE 0 (NOERROR)
QCOUNT 1
ACOUNT 0
NSCOUNT 0
ARCOUNT 0
QUESTION SECTION:
Offset = 0x000c, RR count = 0
Name "(3)myhost(4)mydomain(3)org(0)"
QTYPE A (1)
QCLASS 1
ANSWER SECTION:
empty
AUTHORITY SECTION:
empty
ADDITIONAL SECTION:
empty
10/12/2022 9:48:12 AM 0758 PACKET 000000883A4581A0 UDP Snd 10.1.1.250 d94e Q [0001 D NOERROR] A (3)myhost(4)mydomain(3)org(0)
UDP question info at 000000883A4581A0
Socket = 10476
Remote addr 10.1.1.250, port 53
Time Query=0, Queued=0, Expire=0
Buf length = 0x0fa0 (4000)
Msg length = 0x0029 (41)
Message:
XID 0xd94e
Flags 0x0100
QR 0 (QUESTION)
OPCODE 0 (QUERY)
AA 0
TC 0
RD 1
RA 0
Z 0
CD 0
AD 0
RCODE 0 (NOERROR)
QCOUNT 1
ACOUNT 0
NSCOUNT 0
ARCOUNT 1
QUESTION SECTION:
Offset = 0x000c, RR count = 0
Name "(3)myhost(4)mydomain(3)org(0)"
QTYPE A (1)
QCLASS 1
ANSWER SECTION:
empty
AUTHORITY SECTION:
empty
ADDITIONAL SECTION:
Offset = 0x001e, RR count = 0
Name "(0)"
TYPE OPT (41)
CLASS 4000
TTL 32768
DLEN 0
DATA
Buffer Size = 4000
Rcode Ext = 0
Rcode Full = 0
Version = 0
Flags = 80 DO
10/12/2022 9:48:12 AM 0758 PACKET 000000883E98E210 UDP Rcv 10.1.1.250 d94e R Q [8085 A DR NOERROR] A (3)myhost(4)mydomain(3)org(0)
UDP response info at 000000883E98E210
Socket = 10476
Remote addr 10.1.1.250, port 53
Time Query=2147027, Queued=0, Expire=0
Buf length = 0x0fa0 (4000)
Msg length = 0x0039 (57)
Message:
XID 0xd94e
Flags 0x8580
QR 1 (RESPONSE)
OPCODE 0 (QUERY)
AA 1
TC 0
RD 1
RA 1
Z 0
CD 0
AD 0
RCODE 0 (NOERROR)
QCOUNT 1
ACOUNT 1
NSCOUNT 0
ARCOUNT 1
QUESTION SECTION:
Offset = 0x000c, RR count = 0
Name "(3)myhost(4)mydomain(3)org(0)"
QTYPE A (1)
QCLASS 1
ANSWER SECTION:
Offset = 0x001e, RR count = 0
Name "[C00C](3)myhost(4)mydomain(3)org(0)"
TYPE A (1)
CLASS 1
TTL 1200
DLEN 4
DATA 10.1.1.218
AUTHORITY SECTION:
empty
ADDITIONAL SECTION:
Offset = 0x002e, RR count = 0
Name "(0)"
TYPE OPT (41)
CLASS 4000
TTL 32768
DLEN 0
DATA
Buffer Size = 4000
Rcode Ext = 0
Rcode Full = 0
Version = 0
Flags = 80 DO
10/12/2022 9:48:12 AM 0758 PACKET 000000883A640200 UDP Snd 10.1.0.16 001e R Q [8081 DR NOERROR] A (3)myhost(4)mydomain(3)org(0)
UDP response info at 000000883A640200
Socket = 592
Remote addr 10.1.0.16, port 57756
Time Query=2147027, Queued=2147027, Expire=2147032
Buf length = 0x0200 (512)
Msg length = 0x002e (46)
Message:
XID 0x001e
Flags 0x8180
QR 1 (RESPONSE)
OPCODE 0 (QUERY)
AA 0
TC 0
RD 1
RA 1
Z 0
CD 0
AD 0
RCODE 0 (NOERROR)
QCOUNT 1
ACOUNT 1
NSCOUNT 0
ARCOUNT 0
QUESTION SECTION:
Offset = 0x000c, RR count = 0
Name "(3)myhost(4)mydomain(3)org(0)"
QTYPE A (1)
QCLASS 1
ANSWER SECTION:
Offset = 0x001e, RR count = 0
Name "[C00C](3)myhost(4)mydomain(3)org(0)"
TYPE A (1)
CLASS 1
TTL 1199
DLEN 4
DATA 10.1.1.218
AUTHORITY SECTION:
empty
ADDITIONAL SECTION:
empty
LogFailed
10/12/2022 9:39:38 AM 0748 PACKET 000000883EE821D0 UDP Rcv 10.1.0.16 3858 Q [0001 D NOERROR] A (3)myhost(4)mydomain(3)ORG(0)
UDP question info at 000000883EE821D0
Socket = 592
Remote addr 10.1.0.16, port 62365
Time Query=2146514, Queued=0, Expire=0
Buf length = 0x0fa0 (4000)
Msg length = 0x001e (30)
Message:
XID 0x3858
Flags 0x0100
QR 0 (QUESTION)
OPCODE 0 (QUERY)
AA 0
TC 0
RD 1
RA 0
Z 0
CD 0
AD 0
RCODE 0 (NOERROR)
QCOUNT 1
ACOUNT 0
NSCOUNT 0
ARCOUNT 0
QUESTION SECTION:
Offset = 0x000c, RR count = 0
Name "(3)myhost(4)mydomain(3)ORG(0)"
QTYPE A (1)
QCLASS 1
ANSWER SECTION:
empty
AUTHORITY SECTION:
empty
ADDITIONAL SECTION:
empty
10/12/2022 9:39:38 AM 0748 PACKET 000000883EE821D0 UDP Snd 10.1.0.16 3858 R Q [8081 DR NOERROR] A (3)myhost(4)mydomain(3)ORG(0)
UDP response info at 000000883EE821D0
Socket = 592
Remote addr 10.1.0.16, port 62365
Time Query=2146514, Queued=0, Expire=0
Buf length = 0x0200 (512)
Msg length = 0x0067 (103)
Message:
XID 0x3858
Flags 0x8180
QR 1 (RESPONSE)
OPCODE 0 (QUERY)
AA 0
TC 0
RD 1
RA 1
Z 0
CD 0
AD 0
RCODE 0 (NOERROR)
QCOUNT 1
ACOUNT 0
NSCOUNT 1
ARCOUNT 0
QUESTION SECTION:
Offset = 0x000c, RR count = 0
Name "(3)myhost(4)mydomain(3)ORG(0)"
QTYPE A (1)
QCLASS 1
ANSWER SECTION:
empty
AUTHORITY SECTION:
Offset = 0x001e, RR count = 0
Name "[C010](4)mydomain(3)ORG(0)"
TYPE SOA (6)
CLASS 1
TTL 466
DLEN 61
DATA
PrimaryServer: (6)pdns07(13)domaincontrol(3)com(0)
Administrator: (3)dns(5)jomax(3)net(0)
SerialNo = 2022083000
Refresh = 28800
Retry = 7200
Expire = 604800
MinimumTTL = 600
ADDITIONAL SECTION:
empty