I've been having a problem that I can't explain with my limited understanding of "how things work". I connect to various servers (I'll use svn.mycompany.com
as the example) for work. We use openvpn
to create a secure connection generally to those servers. The company has OpenDNS set up to provide ordinary resolution for the servers, and the resulting addresses of course only work via an established VPN tunnel.
Well after some innocuous reconfiguration of some of the servers (not done by me, and apparently working for everybody else) I notice the following pattern. I have a VPN tunnel, and I try a svn
command:
svn update
I get an error back immediately that the host svn.mycompany.com
can't be resolved. If I then do a host
lookup:
host -a svn.mycompany.com
that responds with the correct IP address. If I then re-try the svn
command, it works, and svn
keeps working, for a while. After some unmeasured period of time however, it stops working again and the cycle repeats.
The same pattern holds for other servers on the other side of the tunnel. I've seen this happen from different networks (i.e., at my house, out at a coffee shop, etc).
I'm not looking for an overall solution. My real question is, how is it that simply running host -a
can at least temporarily "fix" the situation of a domain not resolving? Does host
do something special to bypass a local cache? (If so, I'm still confused, because the address of the servers don't change, or change rarely.)
edit — OK more information. By turning up logging for systemd-resolved
, I was able to use journalctl
to track what my local machine is doing with DNS lookups. What I saw seems interesting but I still don't know enough to understand what it means: DNS lookup requests of query type ANY
seem to overflow the UDP packet size, so systemd-resolved
falls back to making a TCP query. For a normal non-ANY
lookup on these foo.mycompany.com
names, I don't get the packet overflow but it goes on to make a NODATA
local cache entry.
When the ANY
queries force the TCP fallback, systemd-resolved
gets a useful result and makes a positive cache entry.
To me this means that something weird is going on with the UDP responses, but I don't know what that implies about the root cause.