Antoine

Asked: 2021-09-30 01:30:30 +0800 CST2021-09-30 01:30:30 +0800 CST 2021-09-30 01:30:30 +0800 CST

How to fix etcd within a kuberentes cluster?

I have a bare-metal (kubeadm) kubernetes cluster that's really unstable, and I traced it back to an etcd issue.

From the etcd pod's description I get:

Image: k8s.gcr.io/etcd:3.4.13-0
Liveness: ... #success=1 #failure=8
Startup:  ... #success=1 #failure=24

In the logs startup sequence seems fine (compared to another cluster), then I get a lot of warnings:

etcdserver: [...] request ... took too long to execute

But I don't think it's hardware related because etcd_disk_backend_commit_duration_seconds 99th percentile is at 16ms which is fine according to the FAQ.

Anyways, this goes on for a few minutes, and then I guess this causes the restart:

etcdserver/api/etcdhttp: /health error; QGET failed etcdserver: request timed out (status code 503)

Any idea what further steps I can take to diagnose the issue and fix etcd ?

How to fix etcd within a kuberentes cluster?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

How to fix etcd within a kuberentes cluster?

0 Answers