Ping a Specific Port

Question

argyrodagdileli

Asked: 2021-06-01 05:06:46 +0800 CST2021-06-01 05:06:46 +0800 CST 2021-06-01 05:06:46 +0800 CST

EKS ARM Node stuck in NotReady status - runtime network not ready cni config uninitialized

772

I have an EKS Cluster (AWS) named cluster-main running on

Kubernetes version: 1.16
Platform version: eks.4
CNI version v1.6.1

There are two node groups in the cluster

Cluster Name	Instance Type	AMI Type
generic-node-group	t3a.medium	AL2_x86_64
memory-node-group	r5a.large	AL2_x86_64

The nodes in these groups work fine.

I am trying to add a new node group that consists of ARM instances

Cluster Name	Instance Type	AMI Type
cpu-node-group	c6g.xlarge	AL2_ARM_64

However, the nodes of this group are stuck in Not Ready status and the node group fails to get created due to the issue below

Conditions:

Type	Status	LastHeartbeatTime	LastTransitionTime	Reason	Message
Ready	False	Mon, 31 May 2021 08:40:22 -0400	Mon, 31 May 2021 08:38:21 -0400	KubeletNotReady	runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

All node groups have Node IAM Role ARN
All node groups are AWS-managed groups.
All node groups are deployed under two specific subnets (private)

When I SSH into the EC2 instance I get the following logs under /var/log/message

1430 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
1430 kubelet.go:2193] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

I've confirmed that the /etc/cni/net.d directory is indeed empty

I have another EKS cluster with similar characteristics where the ARM node group is initialized without any issue. However, I have found two differences. The test cluster uses:

Platform version: eks.5
CNI version 1.7.5
- amazon-k8s-cni-init:v1.7.5-eksbuild.1
- amazon-k8s-cni:v1.7.5-eksbuild.1

Any ideas?

1 Answers

Voted

argyrodagdileli · Answer 1 · 2021-06-02T03:12:21+08:00

Best Answer

argyrodagdileli

2021-06-02T03:12:21+08:002021-06-02T03:12:21+08:00

Ok - as @thomas suggested the issue was related to the EKS addons.

For context and as I said in my comment, the cluster was initially created at 1.14 version and was later upgraded to 1.16.

However, the aws-node, kube-proxy, and coredns add-ons were never upgraded. Followed the instructions here but the issue remained.

What I did notice though was that the aws-node was still using the same CNI image (v1.6.3)

kubectl describe daemonset aws-node --namespace kube-system | grep Image | cut -d "/" -f 2

After further investigation I had to manually upgrade the CNI version following the instructions here

Lastly, I noticed that an aws-node pod was created for my arm64 node - which previously it didn't. However, the liveness probe for the pod was failing and the node was still stuck in NotReady status. So I had to edit the configuration for the kube-proxy daemon set as described in step (3) of this guide.

2

EKS ARM Node stuck in NotReady status - runtime network not ready cni config uninitialized

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?