I have an EKS Cluster (AWS) named cluster-main running on
- Kubernetes version: 1.16
- Platform version: eks.4
- CNI version v1.6.1
There are two node groups in the cluster
Cluster Name | Instance Type | AMI Type |
---|---|---|
generic-node-group | t3a.medium | AL2_x86_64 |
memory-node-group | r5a.large | AL2_x86_64 |
The nodes in these groups work fine.
I am trying to add a new node group that consists of ARM instances
Cluster Name | Instance Type | AMI Type |
---|---|---|
cpu-node-group | c6g.xlarge | AL2_ARM_64 |
However, the nodes of this group are stuck in Not Ready
status and the node group fails to get created due to the issue below
Conditions:
Type | Status | LastHeartbeatTime | LastTransitionTime | Reason | Message |
---|---|---|---|---|---|
Ready | False | Mon, 31 May 2021 08:40:22 -0400 | Mon, 31 May 2021 08:38:21 -0400 | KubeletNotReady | runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized |
- All node groups have Node IAM Role ARN
- All node groups are AWS-managed groups.
- All node groups are deployed under two specific subnets (private)
When I SSH into the EC2 instance I get the following logs under /var/log/message
1430 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
1430 kubelet.go:2193] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I've confirmed that the /etc/cni/net.d
directory is indeed empty
I have another EKS cluster with similar characteristics where the ARM node group is initialized without any issue. However, I have found two differences. The test cluster uses:
- Platform version: eks.5
- CNI version 1.7.5
- amazon-k8s-cni-init:v1.7.5-eksbuild.1
- amazon-k8s-cni:v1.7.5-eksbuild.1
Any ideas?
Ok - as @thomas suggested the issue was related to the EKS addons.
For context and as I said in my comment, the cluster was initially created at 1.14 version and was later upgraded to 1.16.
However, the
aws-node
,kube-proxy
, andcoredns
add-ons were never upgraded. Followed the instructions here but the issue remained.What I did notice though was that the
aws-node
was still using the same CNI image (v1.6.3)After further investigation I had to manually upgrade the CNI version following the instructions here
Lastly, I noticed that an
aws-node
pod was created for myarm64
node - which previously it didn't. However, the liveness probe for the pod was failing and the node was still stuck inNotReady
status. So I had to edit the configuration for thekube-proxy
daemon set as described in step (3) of this guide.