Background
Created a fresh Kubernetes cluster using kubeadm init --config /home/kube/kubeadmn-config.yaml --upload-certs
and then joining the 2nd control plane node by running the below.
kubeadm join VIP:6443 --token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane --certificate-key <key> \
--v=5
Question
Is etcdctl commands supposed to come back with a return value? Either using the command directly or using the docker exec method shown below. I have these packages installed kubeadm, kubectl, kubelet, and docker.
Kubectl version: 1.20.1 OS: Ubuntu 18.04
Commands from the first node
Command
etcdctl cluster-health
Response
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused
; error #1: EOF
error #0: dial tcp 127.0.0.1:4001: connect: connection refused
error #1: EOF
Command
docker container ls | grep k8s_POD_etcd
Response
k8s_POD_etcd-<nodename>_kube-system_<docker container id>
Command
docker exec -it k8s_POD_etcd-<nodename>_kube-system_<docker container id> etcdctl --endpoints=https://<node ip>:2379 --key=/etc/kubernetes/pki/etcd/peer.key --cert=/etc/kubernetes/pki/etcd/peer.crt --cacert=/etc/kubernetes/pki/etcd/ca.crt member list
Response
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"etcdctl\": executable file not found in $PATH": unknown
EDIT
Upgraded to v3.2 etcdctl API
Command
etcdctl endpoint status
Response
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
The error mentioned by OP is caused by non existing etcdctl exacutable in container.
Why? Because he used the wrong container. Look at the following command:
Notice the container is
k8s.gcr.io/pause:3.2
. It's not an etcd container.But why?? what is this pause container? I won't answer this question because somebody already answered it here: what-are-the-pause-containers.
I will try to answer a better question: Where is the actual etcd container?
Let's have a look at the output of the same command but with slightly modified grep command; lets grep for
etcd
:Now we have two lines of output, one is the previously found pause container, and the second one is our etcd container with a name starting with
k8s_etcd_etcd
. Let's see if we can run docker exec on this container:Yes, we can!
To summarize: it looks like you were looking at the wrong container from the very beginning.
The
context deadline exceeded
is an unclear error returned bygrpc
client when it can't establish the connection. If you want to see the exact error message you should setETCDCTL_API=2
(more details on that can be found here).The cert/key pairs in
/etc/kubernetes/pki/etcd/
should look something like this:Make sure that you apply the right cert/key pair. Also, this guide can help you out.
Note that etcd takes several certificate related configuration options, either through command-line flags or environment variables. The basic setup for it can be found here.
if you are using alpine try
It can happen due to an ordering mistake You might need to run use /bin/bash or /bin/sh, depending on the shell in your container.
The reason is documented in the ReleaseNotes file of Git and it is well explained here - Bash in Git for Windows: Weirdness
some more solution: