I'm using kubeadm
to try to setup a dev master. I'm running into an issue where the healthcheck for kubelet is failing. I'm looking for direction on how to debug that. Running the command that's suggested for debugging (systemctl status kubelet
) don't see the cause of the error:
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Thu 2017-10-05 15:04:23 CDT; 4s ago
Docs: http://kubernetes.io/docs/
Process: 4786 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 4786 (code=exited, status=1/FAILURE)
Oct 05 15:04:23 master.domain..com systemd[1]: Unit kubelet.service entered failed state.
Oct 05 15:04:23 master.domain.com systemd[1]: kubelet.service failed.
Where can I find a specific error message to indicate why this isn't running?
After running
swapoff -a
to disable swap, I'm still not able to provision Kubernetes.
Here's the full output from kubeadm init
:
$ kubeadm init
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.2
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03
[preflight] Starting the kubelet service
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master.my-domain.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.xx.xx.xx 10.xx.xx.xx]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by that:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- There is no internet connection; so the kubelet can't pull the following control plane images:
- gcr.io/google_containers/kube-apiserver-amd64:v1.8.2
- gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2
- gcr.io/google_containers/kube-scheduler-amd64:v1.8.2
You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster
I've also tried removing the docker repository and installing Docker 1.12 which isn't runnable - Error starting daemon: SELinux is not supported with the overlay graph driver on this kernel. Either boot into a newer kernel or disable selinux ...
Things were solved by setting
--fail-swap-on=false
in the systemd script. Just make the modification on the file/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then run
systemctl daemon-reload
and thensystemctl restart kubelet
Found an issue regarding this: https://github.com/kubernetes/kubernetes/issues/53333
Following the previous answer worked for me, but not the resolution offered in the linked issue.
So perhaps, following their suggestion of editing 90-kubeadm.conf (in place of 10-kubeadm.conf) would work
I have had the same issue but on Fedora 30, kubelet 1.15.3, docker-ce 19.03.1 And the output of
systemctl status kubelet
was contained same as in your case:Steps to solve were: 1. check if you have files kubelet.service and 10-kubeadm.conf on the next paths:
10-kubeadm.conf:
kubelet.service:
Delete systemd unit for kubelet in /etc/systemd/system/
Reload all systemd unit files, and recreate the entire dependency tree.
systemctl daemon-reload
restart kubelet
systemctl restart kubelet
The output of kubelet status then should contain:
Initialize a Kubernetes control-plane node:
Note: You have one or more issue:
Try to pull them manually:
You may need to upgrade kubeadm, kubelet, kebectl
This is all already covered in the issue that Atom posted, so I don't feel like I'm contributing an awful lot, but I can replicate your issue if swap is turned on. So for me the solution is to disable swap and retry the init:
The answer posted by dirtbag worked for me as well, but just to be safe after
systemctl daemon-reload
, I did a fullkubeadm reset
andkubeadm init
, not just asystemctl restart kubelet
.If this doesn't work for you, can you please paste the new output of
kubeadm init
after disabling swap?check the error from: journalctl -xeu kubelet
Note: Make sure that the cgroup driver used by kubelet is the same as the one used by Docker. To ensure compatability you can either update Docker, like so:
sudo vim /etc/modules-load.d/k8s.conf
Make sure to have these settings there:
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
first need to create conf file https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/kubelet-integration/#the-kubelet-drop-in-file-for-systemd
then enable kubelet
we need to run kubeadm init on master or kubeadm reset on worker nodes to generate conf files
Make sure swap space is off.
#swapoff -a
manual way -
comment out any line with 'swap' word in /etc/fstab.
after doing it manually run swapoff -a command once again.
then restart kubelet service - #systemctl restart kubelet