I read this in the doc:
Every Pod gets its own IP address ... pods on a node can communicate with all pods on all nodes without NAT.
Should I read that as "every pod gets its own unique cluster wide IP address"?
I assumed this was the case, but reason I ask is I noticed pods with the same IP addresses just on different nodes just after I initialized a new cluster following the instructions here. The cluster has 3 nodes test-vm{4,5,6}
, with test-vm4
as master, running on a local dummy network 10.1.4.0/16. I used flannel for the CNI and set it up like this:
kubectl patch node test-vm{4..6} -p '{ "spec": { "podCIDR": "10.244.0.0/16" } }' # Had to do this because didn't set it on cluster init. See https://stackoverflow.com/a/60944959/2038383.
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Notice 3 IPs occur twice for 2 different pods - 10.244.0.{2,3,4}:
$ kubectl get pods --all-namespaces -o wide -w
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default curl 1/1 Running 0 14m 10.244.0.4 test-vm6 <none> <none>
default my-nginx-cf54cdbf7-d6s9m 1/1 Running 0 17m 10.244.0.3 test-vm6 <none> <none>
default my-nginx-cf54cdbf7-twrvw 1/1 Running 0 17m 10.244.0.2 test-vm6 <none> <none>
default my-nginx-cf54cdbf7-xpff6 1/1 Running 0 17m 10.244.0.4 test-vm5 <none> <none>
default my-nginx-more-5f79688b9d-4c9jk 1/1 Running 0 3m10s 10.244.0.6 test-vm5 <none> <none>
default my-nginx-more-5f79688b9d-7htsn 1/1 Running 0 3m18s 10.244.0.5 test-vm5 <none> <none>
default my-nginx-more-5f79688b9d-gqz9b 1/1 Running 0 3m4s 10.244.0.7 test-vm5 <none> <none>
default nginx1 1/1 Running 0 9s 10.244.0.8 test-vm5 <none> <none>
kube-system coredns-64897985d-kt82d 1/1 Running 0 41m 10.244.0.2 test-vm5 <none> <none>
kube-system coredns-64897985d-rd7gz 1/1 Running 0 41m 10.244.0.3 test-vm5 <none> <none>
kube-system etcd-test-vm4 1/1 Running 0 41m 10.1.4.36 test-vm4 <none> <none>
kube-system kube-apiserver-test-vm4 1/1 Running 0 41m 10.1.4.36 test-vm4 <none> <none>
kube-system kube-controller-manager-test-vm4 1/1 Running 0 41m 10.1.4.36 test-vm4 <none> <none>
kube-system kube-flannel-ds-snkhk 1/1 Running 0 29m 10.1.4.38 test-vm6 <none> <none>
kube-system kube-flannel-ds-wtmqg 1/1 Running 0 29m 10.1.4.37 test-vm5 <none> <none>
kube-system kube-flannel-ds-x46xw 1/1 Running 0 29m 10.1.4.36 test-vm4 <none> <none>
kube-system kube-proxy-mjl69 1/1 Running 0 41m 10.1.4.37 test-vm5 <none> <none>
kube-system kube-proxy-vz2p2 1/1 Running 0 41m 10.1.4.36 test-vm4 <none> <none>
kube-system kube-proxy-xg4gg 1/1 Running 0 41m 10.1.4.38 test-vm6 <none> <none>
kube-system kube-scheduler-test-vm4 1/1 Running 0 41m 10.1.4.36 test-vm4 <none> <none>
Despite what the docs say, all pods can't communicate with each other. They can only communicate with pods on the same node and it's causing errors. Wondering whether this is a red flag that something is wrong or not, and looking for clarification on this one point about pod IP address uniqueness.
I figured it out. Firstly, yes pods are absolutely supposed to have a cluster wide unique IP address. It's fundamental to the way k8s works. The linked k8s doc is crap, and leaves the question a little open. Better worded sources:
Now the question is, why are my pods being assigned same IP addresses? Basically doing this is at flannel CNI init is wrong ( I copied this suggestion from this SO answer):
The podCIDR has to be unique for each node. This is how k8s ensures each scheduled pod has a unique IP address - each node assigns some IP in it's podCIDR. See this great blog post explaining it. The above is not equivalent to setting
--pod-network-cidr
onkubeadm init
like you are supposed to. The--pod-network-cidr
command line option actually corresponds to ClusterConfigurationnetworking.podSubnet
. So if you need to set it after the fact you remove flannel then edit the cluster configuration (haven't actually tested this approach, I just re-initd with --pod-network-cidr set):Once set:
If you going to set each node's
podCIDR
setting it must be unique for each node. You should avoid setting it manually if nodes are expected to be coming and going dynamically - which is the normal scenario.UPDATE: The above method of setting the ClusterConfiguration
networking.podSubnet
after init does not actually work. It doesn't even work if you de-register and re-register all workers nodes which is annoying. AFAIK the only way to get automatic node podCIDR setting to work is to blow away your cluster and re-initialize with--pod-network-cidr
set ornetworking.podSubnet
set in the initial config (see--config
option).