AWS EKS - NodeStatusKnown

0

Hi Team, Suddenly, all the 3 nodes of our AWS EKS Cluster shows NodeStatusUnknown with a message : Kubelet stopped posting node status on Fri, 29 Sep 2023 21:03:21 +0530 (LastTransitionTime). I checked from the EC2 Health dashboard, all the node has enough Memory/CPU. We have updated the Nodegroup to 1.25 on September 22, 2023, 22:10 (UTC+05:30). It was all good after the update with both EKS and NodeGroup v1.25. Started rebooting one of the host from Managegement console with no luck. The kubectl seems to be working fine in all the nodes. Checked via sudo systemctl status kubelet

asked 8 months ago259 views
1 Answer
1

Hello.

The error message "NodeStatusUnknown" typically indicates that the kubelet on a node is no longer able to communicate with the control plane of the Kubernetes cluster. This can happen for a variety of reasons, and troubleshooting it can involve several steps. Here are some steps you can take to diagnose and potentially resolve the issue:

Check Kubelet Logs: Log in to one of the affected nodes and check the kubelet logs to see if there are any error messages or issues reported by the kubelet. You can do this using the following command:

journalctl -u kubelet

Look for any error messages or issues that might give you more information about what's going wrong with the kubelet.

Check Kubernetes Control Plane: Ensure that the control plane components (e.g., API server, controller manager, scheduler) are running and healthy in your EKS cluster. You can check their status using commands like

kubectl get pods -n kube-system and kubectl get componentstatuses

Check Node Networking: Verify that the nodes can communicate with the control plane and that there are no networking issues. Ensure that security groups, VPC configurations, and network routes are set up correctly.

Check EKS Version Compatibility: Confirm that the EKS version you are using is compatible with the NodeGroup version. Sometimes, an upgrade can introduce compatibility issues. Ensure that you are using a supported combination of EKS and NodeGroup versions.

Best regards, Andrii

profile picture
EXPERT
answered 8 months ago
profile picture
EXPERT
reviewed 10 days ago
  • Thanks Andrii. Appreciate your help. Here is the output of the above command you shared.

    journalctl -u kubelet

    -- Logs begin at Mon 2023-09-25 06:05:51 UTC, end at Mon 2023-10-02 13:49:02 UTC. -- Sep 25 06:05:51 ip-XX-XX-XX-XX.us-west-2.compute.internal kubelet[2939]: E0925 06:05:51.170822 2939 pod_workers.go:965] "Error syncing pod, skippi

    kubectl get componentstatuses controller-manager Healthy ok scheduler Healthy ok etcd-0 Healthy etcd-1 Healthy etcd-2 Healthy

    kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE aws-node-c8hvw 1/1 Running 0 9d aws-node-k4l9l 1/1 Running 0 9d aws-node-vkmr9 1/1 Running 0 9d coredns-57ff979f67-6gvxd 1/1 Running 0 2d22h coredns-57ff979f67-fhwzd 1/1 Running 0 9d coredns-57ff979f67-tgvt7 1/1 Terminating 0 9d ebs-csi-controller-649df4c499-j6g49 2/6 Terminating 0 2d23h ebs-csi-controller-649df4c499-t8qng 6/6 Terminating 1 (4d ago) 9d ebs-csi-controller-649df4c499-tsgnp 6/6 Running 3 (3d12h ago) 9d ebs-csi-controller-649df4c499-zftsn 6/6 Running 0 2d22h ebs-csi-node-cxn62 3/3 Termi

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions