After upgrading to the latest docker (18.09.0) and kubernetes (1.12.2) my Kubernetes node breaks on deploying security updates that restart containerd
.
I have: /etc/docker/daemon.json
:
{
"storage-driver": "overlay2",
"live-restore": true
}
This was sufficient to allow docker restart in the past without restarting pods.
Kubelet is started as:
/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni --fail-swap-on=false --feature-gates=PodPriority=true
Now, restarting containerd
will preserves the old pods, but also recreates them under the fresh containerd
process.
Initial situation, before restart:
/usr/bin/containerd
/usr/bin/dockerd -H unix://
\_ containerd --config /var/run/docker/containerd/containerd.toml --log-level info
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1a36f40f3c3531d13b8bc493049a1900662822e01e2c911f8
| \_ /usr/bin/dumb-init /bin/bash /entrypoint.sh /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader
| \_ /bin/bash /entrypoint.sh /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-class
| \_ /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-class=nginx-php --configma
| \_ nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
| \_ nginx: worker process
| \_ nginx: worker process
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/c9a82204115c50788d132aa6c11735d90412dacb48a219d31
| \_ /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/3004e3fa5f7e2b45865c6cc33abb884d9140af16f2594a11d
| \_ /sbin/runsvdir -P /etc/service/enabled
| \_ runsv bird
| | \_ bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
| \_ runsv bird6
| | \_ bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
| \_ runsv confd
| | \_ calico-node -confd
| \_ runsv felix
| \_ calico-node -felix
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1f3c48e28c7fde2f67c40d5641abfa9a29e3dfcbc436321f6
| \_ /bin/sh /install-cni.sh
| \_ sleep 10
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/8371571ce29be4959951cf8ad70e57aa1f4a146f5ca43435b
\_ /coredns -conf /etc/coredns/Corefile
After restarting containerd/docker, those old containers aren't found, and they are all recreated under the fresh containerd
process. This gives duplicate processes for all pods!
It looks like containerd completely forget about the old containers, because killall containerd-shim
, won't just kill those old pods, but simply reparent the children under init:
/usr/bin/dumb-init /bin/bash /entrypoint.sh /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-cl
\_ /bin/bash /entrypoint.sh /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-class=nginx-php -
\_ /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-class=nginx-php --configmap=infra/phpi
\_ nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
\_ nginx: worker process
\_ nginx: worker process
/usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf
/sbin/runsvdir -P /etc/service/enabled
\_ bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
\_ bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
/bin/sh /install-cni.sh
\_ sleep 10
Obviously, having old calico and nginx still hanging around keeps hostports in use, so the new pods don't start and the node becomes completely unusable. Manually killing all old processes or rebooting seems to only option.
Is there some new setting required to make sure kubelet finds those old containerd
instances? Or does this happen because there is a global containerd
and version started by docker?
I faced the same issue yesterday and after containerd restart, I couldn't able to exec into my running pods as well. The issue is with docker itself.
Once the containerd restart, the docker daemon still try to process event streams against the old socket handles. After that, the error handeling when client can't connect to the containerd leads to the CPU spike on machine.
The only way to recover from this situation is to docker restart (systemctl restart docker).
The issue is fixed with the following ticket:
https://github.com/moby/moby/pull/36173
Hope this helps.