Ping a Specific Port

Question

Ben

Asked: 2017-10-06 12:33:23 +0800 CST2017-10-06 12:33:23 +0800 CST 2017-10-06 12:33:23 +0800 CST

debug kubelet not starting

772

I'm using kubeadm to try to setup a dev master. I'm running into an issue where the healthcheck for kubelet is failing. I'm looking for direction on how to debug that. Running the command that's suggested for debugging (systemctl status kubelet) don't see the cause of the error:

kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Thu 2017-10-05 15:04:23 CDT; 4s ago
     Docs: http://kubernetes.io/docs/
  Process: 4786 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
 Main PID: 4786 (code=exited, status=1/FAILURE)

Oct 05 15:04:23 master.domain..com systemd[1]: Unit kubelet.service entered failed state.
Oct 05 15:04:23 master.domain.com systemd[1]: kubelet.service failed.

Where can I find a specific error message to indicate why this isn't running?

After running swapoff -a to disable swap, I'm still not able to provision Kubernetes.

Here's the full output from kubeadm init:

$ kubeadm init
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.2
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03
[preflight] Starting the kubelet service
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master.my-domain.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.xx.xx.xx 10.xx.xx.xx]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by that:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    - There is no internet connection; so the kubelet can't pull the following control plane images:
        - gcr.io/google_containers/kube-apiserver-amd64:v1.8.2
        - gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2
        - gcr.io/google_containers/kube-scheduler-amd64:v1.8.2

You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster

I've also tried removing the docker repository and installing Docker 1.12 which isn't runnable - Error starting daemon: SELinux is not supported with the overlay graph driver on this kernel. Either boot into a newer kernel or disable selinux ...

8 Answers

Voted

dirtbag · Answer 1 · 2017-10-25T09:46:58+08:00

dirtbag

2017-10-25T09:46:58+08:002017-10-25T09:46:58+08:00

Things were solved by setting --fail-swap-on=false in the systemd script. Just make the modification on the file /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false"

then run systemctl daemon-reload and then systemctl restart kubelet

4

Atom · Answer 2 · 2017-10-26T03:36:27+08:00

Atom

2017-10-26T03:36:27+08:002017-10-26T03:36:27+08:00

Found an issue regarding this: https://github.com/kubernetes/kubernetes/issues/53333

Following the previous answer worked for me, but not the resolution offered in the linked issue.

So perhaps, following their suggestion of editing 90-kubeadm.conf (in place of 10-kubeadm.conf) would work

4

Alexred · Answer 3 · 2019-08-27T20:42:44+08:00

I have had the same issue but on Fedora 30, kubelet 1.15.3, docker-ce 19.03.1 And the output of systemctl status kubelet was contained same as in your case:

Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf

Steps to solve were: 1. check if you have files kubelet.service and 10-kubeadm.conf on the next paths:

ls /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
ls /usr/lib/systemd/system/kubelet.service

10-kubeadm.conf:

more /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf 

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

kubelet.service:

more /usr/lib/systemd/system/kubelet.service

[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/kubelet
ExecStart=/usr/bin/kubelet \
        $KUBE_LOGTOSTDERR \
        $KUBE_LOG_LEVEL \
        $KUBELET_API_SERVER \
        $KUBELET_ADDRESS \
        $KUBELET_PORT \
        $KUBELET_HOSTNAME \
        $KUBE_ALLOW_PRIV \
        $KUBELET_ARGS
Restart=on-failure
KillMode=process

[Install]
WantedBy=multi-user.target

Delete systemd unit for kubelet in /etc/systemd/system/

rm -R /etc/systemd/system/kubelet.service.d (confirm "y" for each file)
rm /etc/systemd/system/kubelet.service

Reload all systemd unit files, and recreate the entire dependency tree.

systemctl daemon-reload
restart kubelet

systemctl restart kubelet

The output of kubelet status then should contain:

  Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf

Initialize a Kubernetes control-plane node:

kubeadm reset
systemctl daemon-reload
kubeadm init --pod-network-cidr=10.244.0.0/16

Note: You have one or more issue:

`- There is no internet connection; so the kubelet can't pull the following control plane images:`

Try to pull them manually:

kubeadm config images pull

You may need to upgrade kubeadm, kubelet, kebectl

Pawilon · Answer 4 · 2017-10-26T12:45:11+08:00

Pawilon

2017-10-26T12:45:11+08:002017-10-26T12:45:11+08:00

This is all already covered in the issue that Atom posted, so I don't feel like I'm contributing an awful lot, but I can replicate your issue if swap is turned on. So for me the solution is to disable swap and retry the init:

sudo -i
swapoff -a
kubeadm reset
kubeadm init

The answer posted by dirtbag worked for me as well, but just to be safe after systemctl daemon-reload, I did a full kubeadm reset and kubeadm init, not just a systemctl restart kubelet.

If this doesn't work for you, can you please paste the new output of kubeadm init after disabling swap?

1

srikanth · Answer 5 · 2017-12-16T23:26:16+08:00

srikanth

2017-12-16T23:26:16+08:002017-12-16T23:26:16+08:00

check the error from: journalctl -xeu kubelet

Note: Make sure that the cgroup driver used by kubelet is the same as the one used by Docker. To ensure compatability you can either update Docker, like so:

cat << EOF > /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

1

ant_dev · Answer 6 · 2022-02-04T06:25:52+08:00

ant_dev

2022-02-04T06:25:52+08:002022-02-04T06:25:52+08:00

sudo vim /etc/modules-load.d/k8s.conf
Make sure to have these settings there:

net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1

0

naresh bandari · Answer 7 · 2022-03-25T00:29:26+08:00

naresh bandari

2022-03-25T00:29:26+08:002022-03-25T00:29:26+08:00

first need to create conf file https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/kubelet-integration/#the-kubelet-drop-in-file-for-systemd

then enable kubelet

we need to run kubeadm init on master or kubeadm reset on worker nodes to generate conf files

0

umesh torawane · Answer 8 · 2022-04-30T07:16:24+08:00

umesh torawane

2022-04-30T07:16:24+08:002022-04-30T07:16:24+08:00

Make sure swap space is off.

#swapoff -a

manual way -

comment out any line with 'swap' word in /etc/fstab.

after doing it manually run swapoff -a command once again.

then restart kubelet service - #systemctl restart kubelet

0

debug kubelet not starting

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?