I've deployed a pod with Apache httpd (official image, tag 2.4.41), serving HTTP/2.0 plaintext on port 8082, and it's working "fine", but I see restarts every few hours (kubectl get pod/mypod
, showing 60 restarts in 5 days).
The logs always show "caught SIGWINCH, shutting down gracefully" (-p
, for previous pod):
$ kubectl logs -c httpdcontainer -p pod/mypod
[...]
127.0.0.1 - - [15/Jan/2020:11:12:27 +0000] "GET / HTTP/2.0" 200 9578
[Wed Jan 15 11:12:40.433582 2020] [mpm_event:notice] [pid 1:tid 139963289400448] AH00492: caught SIGWINCH, shutting down gracefully
127.0.0.1 - - [15/Jan/2020:11:12:37 +0000] "GET / HTTP/2.0" 200 9578
The SIGWINCH signal is probably coming from a stop signal via Docker (as per official Dockerfile), e.g. Kubernetes liveness probe? The GET requests are the configured liveness probes on /, but as you can see Apache is returning 200 OK just fine.
Kubernetes kubelet seems to disagree with the 200 OK from Apache and is restarting the pod on purpose:
$ kubectl describe pod/mypod
[...]
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 38m (x208 over 5d) kubelet, node01.kube.mydomain.tld Readiness probe failed: Get http://192.168.87.178:8082/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Normal Killing 38m (x60 over 4d23h) kubelet, node01.kube.mydomain.tld Container docs failed liveness probe, will be restarted
Warning Unhealthy 38m (x221 over 5d) kubelet, node01.kube.mydomain.tld Liveness probe failed: Get http://192.168.87.178:8082/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Normal Pulled 38m (x60 over 4d23h) kubelet, node01.kube.mydomain.tld Container image "myregistry.mydomain.tld/foo/bar/myimage@sha256:<checksum>" already present on machine
Normal Created 38m (x61 over 5d19h) kubelet, node01.kube.mydomain.tld Created container docs
Normal Started 38m (x61 over 5d19h) kubelet, node01.kube.mydomain.tld Started container docs
Warning Unhealthy 13m (x1644 over 5d19h) kubelet, node01.kube.mydomain.tld Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 3m16s (x1717 over 5d19h) kubelet, node01.kube.mydomain.tld Liveness probe failed: HTTP probe failed with statuscode: 503
I have no clue how to tell what is to blame here. I'm using Kubernetes 1.16.4, deployed with kubeadm and Calico CNI.
After digging into this more and more it appears that the Docker daemon was killing the container for going over the memory limit as logged to system logs:
Even if this is the cause, it is a very bad UX for the Kubernetes admin to hunt it down.