I have a problem where dns entry for a external domain broke. The nature of the problem at the time is unknown.
That domain got queried from kubernetes cluster pod in the Google Kubernetes Engine while the entry was broken. The problem persists (incident happened over 2 months ago) when querying that domain from the cluster.
The cluster dns resolver uses metadata.google.internal for dns resolving and from the cluster these queries with dig will:
dig problematic.external.domain @169.254.169.254
# does not resolve and takes over 2 seconds
dig problematic.external.domain @1.1.1.1
# resolves correctly under 200ms
Creating a new vm in the same project and zone resolves the problematic domain correctly. This is affects only the active cluster metadata server dns resolver.
Is there a way to flush dns caches or any other suggestions?
In general I'm trying to avoid editing in-cluster dns settings and would prefer some other means to fix it.
Edit more info:
NodeLocal DNSCache
is already active in the cluster and referencing that documentation https://cloud.google.com/kubernetes-engine/docs/how-to/nodelocal-dns-cache the problem is the metadata dns server.
This excerpt from the benefits list:
DNS queries for external URLs (URLs that don't refer to cluster resources) are forwarded directly to the local Cloud DNS metadata server, bypassing kube-dns.
Which is the ip 169.254.169.254