EKS Node NotReady: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

0

We have been running our site with no problems for a while. On July12, with no changes made to any AWS configuration, our site went down.

We are in the process of making an update, which has not been a problem. Using CircleCI, the error indicated during Kubernetes deployment

Waiting for deployment  rollout to finish: 1 old replicas are pending termination...
error: deployment exceeded its progress deadline

Looking for problems in EKS, we discovered that all pods indicated 0/1 nodes available:

kubectl get pods -n kube-system
NAME                                                      READY   STATUS                   RESTARTS   AGE
aws-node-xxxxx                                            0/1     Init:ImageInspectError   0          3d
coredns-xxxxx-drb7p                                   0/1     Pending                  0          3d
coredns-xxxxx-s58zq                                   0/1     Pending                  0          3d
kube-proxyxxxxx                                        0/1     ImageInspectError        0          3d
spotinst-kubernetes-cluster-controller-xxxxxx-h2mn5   0/1     Pending                  0          3d

Looking at the pod shows the following event:

Warning
FailedScheduling
default-scheduler
0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..

Indeed, the Nodes show up as Not Ready and the condition indicates:

Ready: False
runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Looking at the EC2 instances show that the instances are running ok: Instance state: Running

We found a few articles with similar problems, but following along with the suggestions did not provide a solution. The most puzzling piece is that we have changed nothing in AWS. Everything was working ok last week but now we our site is down and has been down for 3 days.

We tried to get AWS Business support but they need to approve us and after 3 days, we are at a loss.

1 Answer
0
Accepted Answer

SOLUTION:

The EKS Nodes were being created in an older version (1.18) while the Kubernetes version had been updated to 1.24 (and later to 1.26). The solution was to create a new node group at the current version.

answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions