I'm quite new to Kubernetes, event if it doesn't feel like after I spent dozens of hours trying to setup a working Kubernetes.
The edge parameters:
- 1 master and 3 nodes
- set up using kubeadm
- kubernetes version 1.12.1, Calico 3.2
- Primary IP addresses of hosts are 192.168.1.0/21x (relevant because this collides with default pod subnet, because of this I set
--pod-network-cidr=10.10.0.0/16
)
Installation using kubeadm init
and joining worked so far. All pods are running, only coredns keeps crashing, but this is not relevant here.
Installation of Calico
Then, I starting with installing with the etcd datastore and installing with the kubernetes api datastore 50 nodes or less
kubectl apply -f https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/rbac.yaml
curl https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/hosted/calico.yaml -O
# modify calico.yaml # Here, I feel a lack of documentation: Which etcd is needed? The one of kubernetes or a new one? See below
kubectl apply -f calico.yaml
kubectl apply -f https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
curl https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml -O
# modify calico.yaml (here, I have to change the range of CALICO_IPV4POOL_CIDR)
sed -i 's/192.168.0.0/10.10.0.0/' calico.yaml
kubectl apply -f calico.yaml
Test
Now, I use the following definition for testing:
apiVersion: v1
kind: Pod
metadata:
name: www1
labels:
service: testwww
spec:
containers:
- name: meinserver
image: erkules/nginxhostname
ports:
- containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
name: www2
labels:
service: testwww
spec:
containers:
- name: meinserver
image: erkules/nginxhostname
---
kind: Service
apiVersion: v1
metadata:
name: www-np
spec:
type: NodePort
selector:
service: testwww
ports:
- name: http1
protocol: TCP
nodePort: 30333
port: 8080
targetPort: 80
How I test:
curl http://192.168.1.211:30333 # master, no success
curl http://192.168.1.212:30333 # node, no success
curl http://192.168.1.213:30333 # node, only works 50%, with www1 (which is on this node)
curl http://192.168.1.214:30333 # node, only works 50%, with www2 (which is on this node)
The above commands work only if the (randomly chosen) pod is on the node which owns the specified IP address. I expected to have 100% success rate on all nodes.
I saw more success when using the etcd server of kubernetes (pod/etcd-master1). In this case, all the above commands worked. But the pod/calico-kube-controllers didn't start in this case because it was running on a worker node and thus didn't have access to etcd.
In the getting started guide, I found an instruction to install an extra etcd:
kubectl apply -f https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/hosted/etcd.yaml
It's weird: This line is only in the "getting started", but not in "installation". But the default calico.yaml already contains the correct clusterIp of exactly this etcd server (btw how is this IP static? Is it generated by a hash?). Anyway: with this, all Calico nodes came up without an error, but I had the described behaviour where not all NodePorts were working. And I also care about etcd which is open to everyone this way which is not what I want.
So, there are the main questions:
- What's the correct etcd server to use? A separate one or the one of Kubernetes?
- If it should be the one of Kubernetes, why isn't pod/calico-kube-controllers configured by default to run on the master where it has access to etcd?
- If I should serve an own etcd for calico, why isn't it documented under "installation", any why do I have these NodePort problems?
Btw: I was the answers which recommend changing the iptables default rule from DROP to ACCEPT. But this is an ugly hack and probably bypasses all of the security features of Calico
Requested details (Variant with extra etcd)
$ kubectl get all --all-namespaces=true -o wide; kubectl get nodes -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default pod/www1 1/1 Running 0 8s 192.168.104.9 node2 <none>
default pod/www2 1/1 Running 0 8s 192.168.166.136 node1 <none>
kube-system pod/calico-etcd-46g2q 1/1 Running 0 22m 192.168.1.211 master1 <none>
kube-system pod/calico-kube-controllers-f4dcbf48b-88795 1/1 Running 10 23h 192.168.1.212 node0 <none>
kube-system pod/calico-node-956lj 2/2 Running 6 21h 192.168.1.213 node1 <none>
kube-system pod/calico-node-mhtvg 2/2 Running 5 21h 192.168.1.211 master1 <none>
kube-system pod/calico-node-s9njn 2/2 Running 6 21h 192.168.1.214 node2 <none>
kube-system pod/calico-node-wjqlk 2/2 Running 6 21h 192.168.1.212 node0 <none>
kube-system pod/coredns-576cbf47c7-4tcx6 0/1 CrashLoopBackOff 15 24h 192.168.137.86 master1 <none>
kube-system pod/coredns-576cbf47c7-hjpgv 0/1 CrashLoopBackOff 15 24h 192.168.137.85 master1 <none>
kube-system pod/etcd-master1 1/1 Running 17 24h 192.168.1.211 master1 <none>
kube-system pod/kube-apiserver-master1 1/1 Running 2 24h 192.168.1.211 master1 <none>
kube-system pod/kube-controller-manager-master1 1/1 Running 3 24h 192.168.1.211 master1 <none>
kube-system pod/kube-proxy-22mb9 1/1 Running 2 23h 192.168.1.212 node0 <none>
kube-system pod/kube-proxy-96tn7 1/1 Running 2 23h 192.168.1.213 node1 <none>
kube-system pod/kube-proxy-vb4pq 1/1 Running 2 24h 192.168.1.211 master1 <none>
kube-system pod/kube-proxy-vq7qj 1/1 Running 2 23h 192.168.1.214 node2 <none>
kube-system pod/kube-scheduler-master1 1/1 Running 2 24h 192.168.1.211 master1 <none>
kube-system pod/kubernetes-dashboard-77fd78f978-h8czs 1/1 Running 2 23h 192.168.180.9 node0 <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 24h <none>
default service/www-np NodePort 10.99.149.53 <none> 8080:30333/TCP 8s service=testwww
kube-system service/calico-etcd ClusterIP 10.96.232.136 <none> 6666/TCP 21h k8s-app=calico-etcd
kube-system service/calico-typha ClusterIP 10.105.199.162 <none> 5473/TCP 23h k8s-app=calico-typha
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 24h k8s-app=kube-dns
kube-system service/kubernetes-dashboard ClusterIP 10.96.235.235 <none> 443/TCP 23h k8s-app=kubernetes-dashboard
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/calico-etcd 1 1 1 1 1 node-role.kubernetes.io/master= 21h calico-etcd quay.io/coreos/etcd:v3.3.9 k8s-app=calico-etcd
kube-system daemonset.apps/calico-node 4 4 4 4 4 beta.kubernetes.io/os=linux 23h calico-node,install-cni quay.io/calico/node:v3.2.3,quay.io/calico/cni:v3.2.3 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 4 4 4 4 4 <none> 24h kube-proxy k8s.gcr.io/kube-proxy:v1.12.1 k8s-app=kube-proxy
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
kube-system deployment.apps/calico-kube-controllers 1 1 1 1 23h calico-kube-controllers quay.io/calico/kube-controllers:v3.2.3 k8s-app=calico-kube-controllers
kube-system deployment.apps/calico-typha 0 0 0 0 23h calico-typha quay.io/calico/typha:v3.2.3 k8s-app=calico-typha
kube-system deployment.apps/coredns 2 2 2 0 24h coredns k8s.gcr.io/coredns:1.2.2 k8s-app=kube-dns
kube-system deployment.apps/kubernetes-dashboard 1 1 1 1 23h kubernetes-dashboard k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0 k8s-app=kubernetes-dashboard
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
kube-system replicaset.apps/calico-kube-controllers-f4dcbf48b 1 1 1 23h calico-kube-controllers quay.io/calico/kube-controllers:v3.2.3 k8s-app=calico-kube-controllers,pod-template-hash=f4dcbf48b
kube-system replicaset.apps/calico-typha-5f646c475c 0 0 0 23h calico-typha quay.io/calico/typha:v3.2.3 k8s-app=calico-typha,pod-template-hash=5f646c475c
kube-system replicaset.apps/coredns-576cbf47c7 2 2 0 24h coredns k8s.gcr.io/coredns:1.2.2 k8s-app=kube-dns,pod-template-hash=576cbf47c7
kube-system replicaset.apps/kubernetes-dashboard-77fd78f978 1 1 1 23h kubernetes-dashboard k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0 k8s-app=kubernetes-dashboard,pod-template-hash=77fd78f978
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1 Ready master 24h v1.12.0 192.168.1.211 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
node0 Ready <none> 23h v1.12.0 192.168.1.212 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
node1 Ready <none> 23h v1.12.0 192.168.1.213 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
node2 Ready <none> 23h v1.12.0 192.168.1.214 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
$ for i in $(seq 20); do timeout 1 curl -so/dev/null http://192.168.1.214:30333 && echo -n x || echo -n - ;done
x---x-x-x--x-xx-x---
Requested details (Variant with existing etcd)
$ kubectl get all --all-namespaces=true -o wide; kubectl get nodes -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default pod/www1 1/1 Running 0 9m27s 10.10.2.3 node1 <none>
default pod/www2 1/1 Running 0 9m27s 10.10.3.3 node2 <none>
kube-system pod/calico-kube-controllers-f4dcbf48b-qrqnc 0/1 CreateContainerConfigError 1 18m 192.168.1.212 node0 <none>
kube-system pod/calico-node-j8cwr 2/2 Running 2 17m 192.168.1.212 node0 <none>
kube-system pod/calico-node-qtq9m 2/2 Running 2 17m 192.168.1.214 node2 <none>
kube-system pod/calico-node-qvf6w 2/2 Running 2 17m 192.168.1.211 master1 <none>
kube-system pod/calico-node-rdt7k 2/2 Running 2 17m 192.168.1.213 node1 <none>
kube-system pod/coredns-576cbf47c7-6l9wz 1/1 Running 2 21m 10.10.0.11 master1 <none>
kube-system pod/coredns-576cbf47c7-86pxp 1/1 Running 2 21m 10.10.0.10 master1 <none>
kube-system pod/etcd-master1 1/1 Running 19 20m 192.168.1.211 master1 <none>
kube-system pod/kube-apiserver-master1 1/1 Running 2 20m 192.168.1.211 master1 <none>
kube-system pod/kube-controller-manager-master1 1/1 Running 1 20m 192.168.1.211 master1 <none>
kube-system pod/kube-proxy-28qct 1/1 Running 1 20m 192.168.1.212 node0 <none>
kube-system pod/kube-proxy-8ltpd 1/1 Running 1 21m 192.168.1.211 master1 <none>
kube-system pod/kube-proxy-g9wmn 1/1 Running 1 20m 192.168.1.213 node1 <none>
kube-system pod/kube-proxy-qlsxc 1/1 Running 1 20m 192.168.1.214 node2 <none>
kube-system pod/kube-scheduler-master1 1/1 Running 5 19m 192.168.1.211 master1 <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 21m <none>
default service/www-np NodePort 10.106.27.58 <none> 8080:30333/TCP 9m27s service=testwww
kube-system service/calico-typha ClusterIP 10.99.14.62 <none> 5473/TCP 17m k8s-app=calico-typha
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 21m k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/calico-node 4 4 4 4 4 beta.kubernetes.io/os=linux 18m calico-node,install-cni quay.io/calico/node:v3.2.3,quay.io/calico/cni:v3.2.3 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 4 4 4 4 4 <none> 21m kube-proxy k8s.gcr.io/kube-proxy:v1.12.1 k8s-app=kube-proxy
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
kube-system deployment.apps/calico-kube-controllers 1 1 1 0 18m calico-kube-controllers quay.io/calico/kube-controllers:v3.2.3 k8s-app=calico-kube-controllers
kube-system deployment.apps/calico-typha 0 0 0 0 17m calico-typha quay.io/calico/typha:v3.2.3 k8s-app=calico-typha
kube-system deployment.apps/coredns 2 2 2 2 21m coredns k8s.gcr.io/coredns:1.2.2 k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
kube-system replicaset.apps/calico-kube-controllers-f4dcbf48b 1 1 0 18m calico-kube-controllers quay.io/calico/kube-controllers:v3.2.3 k8s-app=calico-kube-controllers,pod-template-hash=f4dcbf48b
kube-system replicaset.apps/calico-typha-5f646c475c 0 0 0 17m calico-typha quay.io/calico/typha:v3.2.3 k8s-app=calico-typha,pod-template-hash=5f646c475c
kube-system replicaset.apps/coredns-576cbf47c7 2 2 2 21m coredns k8s.gcr.io/coredns:1.2.2 k8s-app=kube-dns,pod-template-hash=576cbf47c7
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1 Ready master 21m v1.12.0 192.168.1.211 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
node0 Ready <none> 20m v1.12.0 192.168.1.212 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
node1 Ready <none> 20m v1.12.0 192.168.1.213 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
node2 Ready <none> 20m v1.12.0 192.168.1.214 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
$ for i in $(seq 20); do timeout 1 curl -so/dev/null http://192.168.1.214:30333 && echo -n x || echo -n - ;done
xxxxxxxxxxxxxxxxxxxx
Update: Variant with flannel
I just tried with flannel: Result is surprisingly the same as with extra etcd (pods only answering if on the same node). This brings me to the question: is there anything about my OS? Ubuntu 18.04 with latest updates, installed using debootstrap. No firewall...
How I installed it:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Result:
$ kubectl get all --all-namespaces=true -o wide; kubectl get nodes -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default pod/www1 1/1 Running 0 3m40s 10.10.2.2 node1 <none>
default pod/www2 1/1 Running 0 3m40s 10.10.3.2 node2 <none>
kube-system pod/coredns-576cbf47c7-64wxp 1/1 Running 3 21m 10.10.1.3 node0 <none>
kube-system pod/coredns-576cbf47c7-7zvqs 1/1 Running 3 21m 10.10.1.2 node0 <none>
kube-system pod/etcd-master1 1/1 Running 0 21m 192.168.1.211 master1 <none>
kube-system pod/kube-apiserver-master1 1/1 Running 0 20m 192.168.1.211 master1 <none>
kube-system pod/kube-controller-manager-master1 1/1 Running 0 21m 192.168.1.211 master1 <none>
kube-system pod/kube-flannel-ds-amd64-brnmq 1/1 Running 0 8m22s 192.168.1.214 node2 <none>
kube-system pod/kube-flannel-ds-amd64-c6v67 1/1 Running 0 8m22s 192.168.1.213 node1 <none>
kube-system pod/kube-flannel-ds-amd64-gchmv 1/1 Running 0 8m22s 192.168.1.211 master1 <none>
kube-system pod/kube-flannel-ds-amd64-l9mpl 1/1 Running 0 8m22s 192.168.1.212 node0 <none>
kube-system pod/kube-proxy-5pmtc 1/1 Running 0 21m 192.168.1.213 node1 <none>
kube-system pod/kube-proxy-7ctp5 1/1 Running 0 21m 192.168.1.212 node0 <none>
kube-system pod/kube-proxy-9zfhl 1/1 Running 0 21m 192.168.1.214 node2 <none>
kube-system pod/kube-proxy-hcs4g 1/1 Running 0 21m 192.168.1.211 master1 <none>
kube-system pod/kube-scheduler-master1 1/1 Running 0 20m 192.168.1.211 master1 <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22m <none>
default service/www-np NodePort 10.101.213.118 <none> 8080:30333/TCP 3m40s service=testwww
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 22m k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/kube-flannel-ds-amd64 4 4 4 4 4 beta.kubernetes.io/arch=amd64 8m22s kube-flannel quay.io/coreos/flannel:v0.10.0-amd64 app=flannel,tier=node
kube-system daemonset.apps/kube-flannel-ds-arm 0 0 0 0 0 beta.kubernetes.io/arch=arm 8m22s kube-flannel quay.io/coreos/flannel:v0.10.0-arm app=flannel,tier=node
kube-system daemonset.apps/kube-flannel-ds-arm64 0 0 0 0 0 beta.kubernetes.io/arch=arm64 8m22s kube-flannel quay.io/coreos/flannel:v0.10.0-arm64 app=flannel,tier=node
kube-system daemonset.apps/kube-flannel-ds-ppc64le 0 0 0 0 0 beta.kubernetes.io/arch=ppc64le 8m21s kube-flannel quay.io/coreos/flannel:v0.10.0-ppc64le app=flannel,tier=node
kube-system daemonset.apps/kube-flannel-ds-s390x 0 0 0 0 0 beta.kubernetes.io/arch=s390x 8m21s kube-flannel quay.io/coreos/flannel:v0.10.0-s390x app=flannel,tier=node
kube-system daemonset.apps/kube-proxy 4 4 4 4 4 <none> 22m kube-proxy k8s.gcr.io/kube-proxy:v1.12.1 k8s-app=kube-proxy
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
kube-system deployment.apps/coredns 2 2 2 2 22m coredns k8s.gcr.io/coredns:1.2.2 k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
kube-system replicaset.apps/coredns-576cbf47c7 2 2 2 21m coredns k8s.gcr.io/coredns:1.2.2 k8s-app=kube-dns,pod-template-hash=576cbf47c7
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1 Ready master 22m v1.12.1 192.168.1.211 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
node0 Ready <none> 21m v1.12.1 192.168.1.212 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
node1 Ready <none> 21m v1.12.1 192.168.1.213 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
node2 Ready <none> 21m v1.12.1 192.168.1.214 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://17.12.1-ce
$ for i in $(seq 20); do timeout 1 curl -so/dev/null http://192.168.1.214:30333 && echo -n x || echo -n - ;done
-x--xxxxx-x-x---xxxx
So far, I found 3 problems:
docker version
In my first tries, I used docker.io from the default Ubuntu repositories (17.12.1-ce). In the tutorial https://computingforgeeks.com/how-to-setup-3-node-kubernetes-cluster-on-ubuntu-18-04-with-weave-net-cni/, I discovered they recommend something different:
This is now version 18.6.1, and also doesn't cause a warning anymore in kubeadm preflight check.
cleanup
I used
kubeadm reset
and deleting some directories when resetting my VMs to an unconfigured state. After I read some bug reports, I decided to extend the list of directories to remove. This is what I do now:Calico setup
With the above changes, I was immediately able to init a full-working setup (all pods "Running" and curl working). I did "Variant with extra etcd".
All this worked until the first reboot, then I had again the
Digging into this problem showed me.
Then, I realized that I did two installation instructions in chain which were meant to do only one.
Result
Could it be that you did not install the
kubernetes-cni
package? If no network Providers work, this is very likely. AFAIK it is also not mentioned in the docs that you need to do this.Should also be visible in the
kubelet
service log.