I can't get to goodtodo.com from any computer in my office. I don't seem to have a problem getting to any other URL. The server is a Windows 2003 Server and is the domain controller, DHCP server, and DNS server. Most clients are Win XP, but some are Win 7.
If I ping goodtodo from the client, I get "..could not find host". If I ping the IP (74.80.216.31) from the client, I get 0% loss. The IP also works in the browser. Flushing the dns on the client doesn't fix it. ipconfig /all on the client shows the 2003 server as the DNS server.
On the server, I can ping goodtodo successfully. It shows two DNS servers, 127.0.0.1 (itself I assume) and 192.168.1.30 (a windows 2000 server that used to be the domain controller, but was replaced and demoted about a month ago). My guess is that the server is using its secondary DNS server to resolve the name and that's why it works from there.
The problem started right about the time we switched servers. I think it's related because of the timing, but I can't be sure.
On the 2003 Server, the DNS has a cached entry for goodtodo with two entries
(same as parent folder) Name Server(NS) ns1.wshost.net
(same as parent folder) Name Server(NS) ns2.wshost.net
If I Clear Cache on the 2003 server, the problem goes away for about 4-7 days, then comes back. Clearing the cache fixes it immediately every time.
On a client machine, nslookup goodtodo.com 8.8.8.8 resolves correctly using google's name server. So it seems that it's the DNS server on the 2003 box. Since clearing the cache fixes it temporarily, it seems that the cache gets polluted at times, but that's pure speculation as I don't really know what I'm talking about.
There is a Sonicwall firewall between the server and the cable modem. I ruled that out as a problem since the google name server worked OK. Also, I don't see anything in the firewall settings that would affect this URL compared to any other normal HTTP request.
Am I missing something obvious or are there some other things I can do to narrow down where the problem is?
Additional Information
It came back a few hours after I posted the question - inexplicably to me. Today it went down again. I checked the cache and it was the same as I reported above.
Using google's name server still worked as before and I could ping the IP address of the site directly without error. nslookup goodtodo.com ns1.wshost.net times out on both the client and the server. ping ns1.wshost.net times out on the client, but works OK on the server.
I deleted the entry in the cache for goodtodo.com on the DNS server and everything started working again. I still can't ping ns1.wshost.net from the client. I went and looked at the cache after I visited the site and there were three records: the two I list above plus an A record that references the IP address. The TTL on the name server records is 24 hours and on the A record is 14 minutes. The A record is now gone, but the name still resolves.
In any event, it seems that the DNS server is going to its cache, but is unable to resolve ns*.wshost.net. If I delete the records from the cache, I assume it's using the root hints to resolve the name similar to if I cleared the whole cache.
On the server OS, the DNS Client and DNS Server are two different services that don't share cache or lookup settings. DNS Server service does not use the local NIC DNS properties on the server for forwarding requests. You should look in the DNS MMC console, then in the server properties under forwarders and possibly root hints to see how the DNS server will look up any DNS entries not in it's cache. To see what's in it's cache you need to make sure "advanced" is selected under View pull down menu. One of a few things could be happening:
number 2 is most likely.
My outside IT guy thinks that the TTL on ns1.wshost.net is significantly shorter than on my DNS server. I created this reg on the box that hosts my DNS server
Local Machine\SYSTEM\CurrentControlSet\Services\DNS\Parameters\MaxCacheTTL
and set the value to 300 (5 minutes). No problems since I did that. I'm not confident that this fixes the root problem, but it worked.
As per http://support.microsoft.com/kb/968372 this is the correct setting but wrong value. But I guess this is OK since it's working :)