Following an upgrade to v1.19.7 with kubeadm
, my pods are unable to request the kube-dns
service via the service's ClusterIP. When using the kube-dns
pod IP address instead, DNS resolution works.
kube-dns
pods are up and running:
$ kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-7674cdb774-2m58h 1/1 Running 0 33m
coredns-7674cdb774-x44b9 1/1 Running 0 33m
logs are clear:
$ kubectl logs coredns-7674cdb774-2m58h -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 7442f38ca24670d4af368d447670ad91
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] 127.0.0.1:40705 - 31415 "HINFO IN 7224361654609676299.2243479664305694168. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.003954173s
kube-dns service is exposed:
$ kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 301d
endpoints are also configured:
$ kubectl describe endpoints kube-dns --namespace=kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-01-19T14:23:13Z
Subsets:
Addresses: 10.44.0.1,10.47.0.2
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
dns-tcp 53 TCP
dns 53 UDP
metrics 9153 TCP
Events: <none>
here is my coredns ConfigMap:
$ kubectl describe cm -n kube-system coredns
Name: coredns
Namespace: kube-system
Labels: <none>
Annotations: <none>
Data
====
Corefile:
----
.:53 {
log
errors
ready
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
Events: <none>
on workers, kube-proxy is running:
$ kubectl get pods -n kube-system -o wide --field-selector spec.nodeName=ccqserv202
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-proxy-8r65s 1/1 Running 0 78m 10.158.37.202 ccqserv202 <none> <none>
weave-net-kvnzg 2/2 Running 0 6h3m 10.158.37.202 ccqserv202 <none> <none>
networking between pods is working, as I am able to communicate between pods running on separate nodes (here, dnsutils
runs on node ccqserv202
, while 10.44.0.1
is the pod IP address from coredns-7674cdb774-x44b9
, running on node ccqserv223
).
$ kubectl exec -i -t dnsutils -- ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1): 56 data bytes
64 bytes from 10.44.0.1: seq=0 ttl=64 time=2.101 ms
64 bytes from 10.44.0.1: seq=1 ttl=64 time=1.184 ms
64 bytes from 10.44.0.1: seq=2 ttl=64 time=1.107 ms
^C
--- 10.44.0.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 1.107/1.464/2.101 ms
I am using "ipvs" as kube-proxy mode (although I can confirm the exact same behavior happens when using "iptables" or "userspace" modes).
Here is my ipvsadm -Ln
on node ccqserv202
:
$ ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.46.128.0:30040 rr
TCP 10.96.0.1:443 rr
-> 10.158.37.223:6443 Masq 1 0 0
-> 10.158.37.224:6443 Masq 1 0 0
-> 10.158.37.225:6443 Masq 1 1 0
TCP 10.96.0.10:53 rr
TCP 10.96.0.10:9153 rr
TCP 10.97.147.126:2746 rr
TCP 10.100.162.140:9000 rr
TCP 10.101.126.110:5432 rr
TCP 10.109.184.125:4040 rr
TCP 10.110.163.112:9090 rr
TCP 10.110.215.252:8443 rr
TCP 10.158.37.202:30040 rr
TCP 127.0.0.1:30040 rr
TCP 134.158.237.2:30040 rr
UDP 10.96.0.10:53 rr
as you can see, then are no realservers configured under 10.96.0.10
virtual addresses, but there are under 10.96.0.1
(which corresponds to the kubernetes
API service).
I am able to open a connection to 10.96.0.1
on port 443
$ kubectl exec -i -t dnsutils -- nc -vz 10.96.0.1 443
10.96.0.1 (10.96.0.1:443) open
I am able to open a connection to 10.44.0.1
on port 53
$ kubectl exec -i -t dnsutils -- nc -vz 10.44.0.1 53
10.44.0.1 (10.44.0.1:53) open
it evens resolves!
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.44.0.1
Server: 10.44.0.1
Address: 10.44.0.1#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
but this does not work when I use kube-dns
ClusterIP 10.96.0.10
$ kubectl exec -i -t dnsutils -- nc -vz 10.96.0.10 53
command terminated with exit code 1
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.96.0.10
;; connection timed out; no servers could be reached
here is dnsutils
resolv.conf
file:
$ kubectl exec -i -t dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local xxxxx.fr
nameserver 10.96.0.10
options ndots:5
finally, when I try manually adding realservers to ipvs
on the node,
$ ipvsadm -a -u 10.96.0.10:53 -r 10.44.0.1:53 -m
kube-proxy
detects it and immediately cleans it:
I0119 16:17:27.062890 1 proxier.go:2076] Using graceful delete to delete: 10.96.0.10:53/UDP/10.44.0.1:53
I0119 16:17:27.062906 1 graceful_termination.go:159] Trying to delete rs: 10.96.0.10:53/UDP/10.44.0.1:53
I0119 16:17:27.062974 1 graceful_termination.go:173] Deleting rs: 10.96.0.10:53/UDP/10.44.0.1:53
also, we can see with tcpdump
that DNS requests from dnsutils
to 10.96.0.10
are NOT rewritten to 10.44.0.1
or 10.47.0.2
as they should be with ipvs
10.46.128.8.53140 > 10.96.0.10.domain: [bad udp cksum 0x94f7 -> 0x12c4!] 4628+ A? kubernetes.default.default.svc.cluster.local. (62)
16:27:56.950950 IP (tos 0x0, ttl 64, id 45349, offset 0, flags [none], proto UDP (17), length 90)
10.46.128.8.53140 > 10.96.0.10.domain: [bad udp cksum 0x94f7 -> 0x12c4!] 4628+ A? kubernetes.default.default.svc.cluster.local. (62)
16:27:56.951321 IP (tos 0x0, ttl 64, id 59811, offset 0, flags [DF], proto UDP (17), length 70)
a tcpdump on kube-dns pods on the other side shows that these requests never arrive.
I've now spent a full day trying to understand what is happening and how to fix, and I am now running out of ideas. Any help would be very much welcome.
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ unfortunately did not help.
Thank you!
tl;dr: DNS resolution in the Kubernetes cluster does not work when using the kube-dns
service ClusterIP, although I am able to resolve when using the kube-dns
pods IP address. I would think something is wrong with my kube-proxy
configuration, but I can't find what.