I have two windows domain controllers.
10.10.10.10 Primary ( win 2008 r2 )
10.10.10.20 Replica ( win 2012 r2 )
The second one is configured as a replica of the first.
About once per week, the primary DC will negatively cache most .io
domains.
This makes it so noone in the company can access sites like:
chef.io
packer.io
yahoo.io
github.io
Strangely I can still access some .io pages, like the ones at github.io
The solution is to RDP into the DNS server and run dnscmd /clearcache
. That fixes the problem for 7 to 10 days.
Further symptoms
- Only affects the primary domain controller (the secondary, and other domain controllers can resolve these sites just fine)
- google dns servers also work
- Usually happens at about 11 am on wednesdays.
I'm not very familiar with windows, but here are the things I've tried
- Look at logs, I only see the following lines that look interesting
8:15AM
The DNS server wrote version 4638 of zone 254.10.in-addr.arpa to file 254.10.in-addr.arpa.dns..in-addr.arpa to file 254.10.in-addr.arpa.dns.
8:16AM
A more recent version, version 4639 of zone 254.10.in-addr.arpa was found at the DNS server at 10.254.40.51. Zone transfer is in progress.ic replication between domain controllers in a common domain or forest. By installing multiple domain controllers in a domain running DNS Server, you can ensure that DNS will continue to work when a domain co
- Verify there are no forward or reverse lookup zones for the .io domain
- Ensure there is nothing in the hosts file blocking the .io domain
- Compare the output of
ipconfig /displaydns
on all domain controllers
Is there anything else I can investigate to find out why the dns cache keeps getting corrupt so predictably? Is there a windows dns setting that can forcibly flush the cache when doing zone transer
Update
I've narrowed this down to the fact that I often switch from wired to wireless right before the Wednesday meeting. The wireless has 1 windows 2008 dns server and 1 windows 2012 dns server. When the 2008 server is selected as primary, the problem returns. The workaround is to run this dnscmd /clearcache
. Since the 2008 server is going away, I'm sure this problem will fix itself.
Consider updating your root.hints file. Maybe it’s pointing to some old root name servers that (for some reason) aren’t returning .io domains.
Maybe you have a routing issue that prevents accessing them (ie: you’re black-holing the IP range they run on) which prevents looking up the domains within it. This one is my bet- maybe you have a firewall rule against a country or IP block. Use my results below to check your firewall or do a dig/nslookup for the .io TLD servers (you can download a binary for Windows from http://www.isc.org/downloads/
Can you reach all of these DNS servers directly? Your DNS server may repeatedly use the first in the list for example. Keep in mind, this list is at a point in time (right now) and changes, but should give you an initial point to see if you can reach the .io root name servers.
If you’re using forwarders, test an nslookup to those forwarders directly. If it doesn’t return, contact the person who runs them (your ISP).
==== Update: Given your update, where you note that it happens when you change ISPs, I would guess one of your connections is using IPv6 and the other is only IPv4 capable? It could be that it’s caching IPv6 return address, but that isn’t reachable once you switch connections.