Antoine's questions -server

Antoine

Asked: 2021-09-30 01:30:30 +0800 CST

How to fix etcd within a kuberentes cluster?

0

I have a bare-metal (kubeadm) kubernetes cluster that's really unstable, and I traced it back to an etcd issue.

From the etcd pod's description I get:

Image: k8s.gcr.io/etcd:3.4.13-0
Liveness: ... #success=1 #failure=8
Startup:  ... #success=1 #failure=24

In the logs startup sequence seems fine (compared to another cluster), then I get a lot of warnings:

etcdserver: [...] request ... took too long to execute

But I don't think it's hardware related because etcd_disk_backend_commit_duration_seconds 99th percentile is at 16ms which is fine according to the FAQ.

Anyways, this goes on for a few minutes, and then I guess this causes the restart:

etcdserver/api/etcdhttp: /health error; QGET failed etcdserver: request timed out (status code 503)

Any idea what further steps I can take to diagnose the issue and fix etcd ?

Antoine

Asked: 2021-09-01 06:29:39 +0800 CST

How can I see which commands ansible runs and their output?

1

I'm new to Ansible, and yes I know this has already been asked many times, but I already tried to apply advice I saw elsewhere.

I did export ANSIBLE_STDOUT_CALLBACK=debug and then ansible-playbook -vvvvvv arch-upgrade.yaml -l my-host with arch-upgrade.yaml below:

- name: ArchLinux up-to-date
  hosts: all
  tasks:
    - name: full system upgrade
      pacman:
        update_cache: yes
        upgrade: yes
      register: out
    - debug: msg="{{ out }}"

I get a lot of details about how ansible opens it's ssh connections, transfers its python file, runs it remotely, etc. but as far as I can tell, not a single thing about what command the python script actually ran and what it returned (stdout, stderr, return code). That's why I'm not including this very long log here, but I can upon request.

Is someone aware of how I can ask ansible to be more verbose about what it does (and not how it does it) ?

Antoine

Asked: 2021-06-03 06:02:35 +0800 CST

How can I find which kubernetes certificate has expired?

2

I have a kubeadm installed kubernetes cluster. Recently it stopped working. kubelet is running but seems stuck in initialization phases. I think the root cause is this recurring log in kube-apiserver:

1 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2021-06-02T13:18:50Z is after 2021-05-29T15:48:22Z

So there is a certificate issue, also kubectl is failing with unauthorized. The thing is, kubeadm certs check-expiration seems happy, and I even manually checked a few yaml config files (base64 decoded certificates, and run them through openssl to check the date). Nevertheless, I asked kubeadm to renew all certificates and rebooted everything, to no effect.

Any idea how I can identify which certificate exactly has expired ?

Antoine

Asked: 2021-04-23 04:46:34 +0800 CST

Warnings in kubeadm after migrating from docker to containerd

2

I run a kubernetes cluster, installed with kubeadm. I recently upgraded from 1.19 to 1.20 and migrated the container runtime from docker to containerd, since docker is now deprecated.

I configured containerd and kubelet to use it, and uninstalled docker from all nodes. Everything seems to be running fine.

Today, I tried to upgrade from 1.20 to 1.21 but I got two warnings when running kubeadm upgrade plan which make me think the containerd transition was not complete:

It tries to use docker:
```
cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH
```
- I probably have a configuration issue because kubeadm seems to not be aware that we don't use docker anymore, but I haven't found the right option in the documentation or local conf, except --cri-socket which doesn't work with kubeadm upgrade.
- Second, the wording is strange: "when starting the Kubelet". But my kubelet is starting just fine, doesn't complain about missing docker or CgroupDriver.
It doesn't detect the cgroup driver setting:
```
The 'cgroupDriver' value in the KubeletConfiguration is empty. Starting from 1.22, 'kubeadm upgrade' will default an empty value to the 'systemd' cgroup driver. The cgroup driver between the container runtime and the kubelet must match!
```
This is really surprising because I have cgroupDriver: systemd in kubectl -n kube-system get cm kubelet-config-1.20 -o yaml, in /var/lib/kubelet/config.yaml, also the flag in /etc/default/kubelet and /var/lib/kubelet/kubeadm-flags.env, and it is even printed by kubeadm --v=10 !

How can I find out if there is an underlying configuration issue, or is I can safely ignore these warnings ?

I'm not sure what files, configmap or logs may be useful to help me solve this, but I will gladly provide them if needed.

Antoine

Asked: 2019-07-13 01:32:33 +0800 CST

SSH Jump and local command

7

I added this in my ~/.ssh/config to help avoid stupid mistakes:

Host *.prod-domain.com
    LocalCommand print "WARNING: PROD" && print "continue ?" && read
    PermitLocalCommand yes

Which makes ssh print a warning and a prompt when I try to connect to a host under prod-domain.com.

Now, most hosts do not expose ssh publically, so we have to go through a gateway. I used to do

ssh -J gateway.prod-domain.com target.prod-domain.com

But with the local command enabled, ssh fails with:

Bad packet length 1231976033.
ssh_dispatch_run_fatal: Connection to UNKNOWN port 65535: message authentication code incorrect

Connecting directly (e.g. ssh gateway.prod-domain.com) still works fine, and connecting with a jump works if I comment the local command.

Are local commands and ssh jumps incompatible ? Is it documented somewhere, and is there a way to make it work (like disabling the local command when "jumping"), or did I maybe hit a bug ?

How to fix etcd within a kuberentes cluster?

How can I see which commands ansible runs and their output?

How can I find which kubernetes certificate has expired?

Warnings in kubeadm after migrating from docker to containerd

SSH Jump and local command

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?