Ongoing service disruptions
For the most recent update on ongoing service disruptions affecting the AWS Middle East (UAE) Region (ME-CENTRAL-1), refer to the AWS Health Dashboard. For information on AWS Service migration, see How do I migrate my services to another region?
How do I troubleshoot custom nodepool and nodeclass provisioning issues in Amazon EKS Auto Mode?
I'm facing issues with custom nodepool and nodeclass creation while using EKS Auto Mode.
Short description
When using Amazon EKS (Elastic Kubernetes Service) Auto Mode, you might encounter provisioning issues when working with custom nodepools or nodeclasses where they get stuck in an "Unknown" or "NotReady" or "Empty" state. This problem often arises from configuration errors in either the nodepool or the nodeclass. This article covers some of the common causes for these provisioning failures and provides troubleshooting steps to resolve them.
Issue - Custom NodePool is not ready.
# kubectl get nodepool
NAME NODECLASS NODES READY AGE
custom-nodepool default1 0 False 39s
general-purpose default 0 True 15m
system default 1 True 15m
Causes -
1. A NodePool can stay in "not ready" state when the NodeClass referenced in the NodePool configuration doesn't exist or is misspelled.
Troubleshooting - To troubleshoot the issue, describe the nodepool to check the events.
# kubectl describe nodepool {nodepool-name}
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ValidationSucceeded 2m27s karpenter Status condition transitioned, Type: ValidationSucceeded, Status: Unknown -> True, Reason: ValidationSucceeded
Normal NodeClassReady 2m27s karpenter Status condition transitioned, Type: NodeClassReady, Status: Unknown -> False, Reason: NodeClassNotFound, Message: NodeClass not found on cluster
Normal Ready 2m27s karpenter Status condition transitioned, Type: Ready, Status: Unknown -> False, Reason: UnhealthyDependents, Message: NodeClassReady=False
Warning 20s (x2 over 2m23s) karpenter Failed resolving NodeClass
Resolution - Update the NodeClass reference either in the NodePool directly using the command below, or modify the manifest and reapply the changes.
To modify the configuration directly in the cluster, execute the following command
# kubectl edit nodepool {nodepool-name}
Update the nodeclass reference
nodeClassRef:
group: eks.amazonaws.com
kind: NodeClass
name: {nodeclass-name} --> Update name here
2. EKS Auto Mode uses different labels than Karpenter. Labels related to EC2 managed instances in Auto Mode start with eks.amazonaws.com. If an unsupported label is present in the auto mode’s nodepool configuration, it can result in nodepool’s not ready state.
Troubleshooting - To troubleshoot the issue, describe the nodepool to check the events and errors (sample below)
# kubectl describe nodepool {nodepool-name}
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ValidationSucceeded 9m48s karpenter Status condition transitioned, Type: ValidationSucceeded,Status: Unknown -> False, Reason: NodePoolValidationFailed,
Message: invalid value: label is restricted; specify a well known label or a custom label that does not use a restricted domain (label=karpenter.k8s.aws/instance-category,
restricted-labels=[k8s.io karpenter.k8s.aws karpenter.sh kubernetes.io], well-known-labels=[app.kubernetes.io/managed-by eks.amazonaws.com/capacity-reservation-id
eks.amazonaws.com/capacity-reservation-type eks.amazonaws.com/compute-type eks.amazonaws.com/instance-accelerator-count eks.amazonaws.com/instance-accelerator-manufacturer
eks.amazonaws.com/instance-accelerator-name eks.amazonaws.com/instance-category eks.amazonaws.com/instance-cpu eks.amazonaws.com/instance-cpu-manufacturer
eks.amazonaws.com/instance-cpu-sustained-clock-speed-mhz eks.amazonaws.com/instance-ebs-bandwidth eks.amazonaws.com/instance-encryption-in-transit-supported eks.amazonaws.com/instance-family
eks.amazonaws.com/instance-generation eks.amazonaws.com/instance-gpu-count eks.amazonaws.com/instance-gpu-manufacturer eks.amazonaws.com/instance-gpu-memory
eks.amazonaws.com/instance-gpu-name eks.amazonaws.com/instance-hypervisor eks.amazonaws.com/instance-local-nvme eks.amazonaws.com/instance-memory
eks.amazonaws.com/instance-network-bandwidth eks.amazonaws.com/instance-size karpenter.sh/capacity-type karpenter.sh/nodepool kubernetes.io/arch kubernetes.io/os
node.kubernetes.io/instance-type node.kubernetes.io/windows-build topology.k8s.aws/zone-id topology.kubernetes.io/region topology.kubernetes.io/zone])
in requirements, restricted
Resolution - EKS Auto Mode uses different labels than Karpenter. Labels related to EC2 managed instances start with eks.amazonaws.com.
To fix the issue, update the nodepool configuration either using imperative way i.e kubectl edit command or update the manifest file with correct labels and re-apply the configuration.
List of EKS Auto Mode Support labels can be found here.
Issue - Custom NodePool status is empty.
# kubectl get nodepool
NAME NODECLASS NODES READY AGE
eks-auto-mode default 8s
general-purpose default 0 True 25m
Troubleshooting - To troubleshoot the issue, describe the nodepool and check the Group and Kind in nodeclass reference :
# kubectl describe nodepool {nodepool-name}
Node Class Ref:
Group: karpenter.k8s.aws
Kind: EC2NodeClass
Name: default
Cause - Incorrect Group or Kind in node class reference.
Resolution - In EKS Auto Mode, the NodeClassRef group must be "eks.amazonaws.com" and the Kind must be "NodeClass".
To fix the issue, update the Group / Kind in the manifest file and re-create the nodepool or to update it on directly on the cluster, please follow the below mentioned steps.
- Run below command to open nodepool in edit mode
# kubectl edit nodepool eks-auto-mode
- Update the group name in NodeClassRef Section and save the file. This will create a temporary file in /tmp folder (Sample Output below)
→ A copy of your changes has been stored to
/var/folders/7z/3l7pdd_96td5hyx68fwk6qmh0000gr/T/kubectl-edit-25******89.yaml
- Update the nodepool configuration using force option and by passing the temporary file created in previous step.
# kubectl apply -f {file name from output of step 2} --force
Issue - Custom NodeClass is in NotReady State.
# kubectl get nodeclass
NAME ROLE READY AGE
customclass AutoModeCustomRole False 6m29s
default AmazonEKSAutoNodeRole True 41h
Cause - A NotReady status for custom nodeclass can be caused by missing Kubernetes permissions on the Auto Mode Node IAM role.
Troubleshooting -
- To troubleshoot the issue, describe the NodeClass and look for Unauthorized Error in the output.
# kubectl describe nodeclass customclass
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SubnetsReady 50m karpenter Status condition transitioned, Type: SubnetsReady, Status: Unknown -> True, Reason: SubnetsReady
Normal SecurityGroupsReady 50m karpenter Status condition transitioned, Type: SecurityGroupsReady, Status: Unknown -> True, Reason: SecurityGroupsReady
Normal InstanceProfileReady 50m karpenter Status condition transitioned, Type: InstanceProfileReady, Status: Unknown -> False, Reason: UnauthorizedNodeRole, Message: Role AutoModeCustomRole is unauthorized to join nodes to the cluster
Normal ValidationSucceeded 50m karpenter Status condition transitioned, Type: ValidationSucceeded, Status: Unknown -> False, Reason: DependenciesNotReady, Message: Awaiting Instance Profile, Security Group, and Subnet resolution
Normal Ready 50m karpenter Status condition transitioned, Type: Ready, Status: Unknown -> False, Reason: UnhealthyDependents, Message: ValidationSucceeded=False, InstanceProfileReady=False
Below error in output confirms Permissions issue.
Type: InstanceProfileReady, Status: Unknown → False, Reason: UnauthorizedNodeRole,
Message: Role AutoModeCustomRole is unauthorized to join nodes to the cluster
- Check CloudTrail events history :
Filter the CloudTrail logs to show events only from eks.amazonaws.com, and examine these records for any Unauthorized errors.
Resolution - Node IAM Role should minimum required IAM permissions and AmazonEKSAutoNodePolicy EKS Access Policy.
EKS automatically creates an Access Entry for the default node IAM role during cluster creation or when you create the built-in nodeclass and nodepools in scenarios like Migration to Auto Mode If you change the node IAM role associated with a NodeClass, you will need to ensure that the node role has the minimum required IAM permissions and you will need to create a new Access Entry. For more information on EKS Access Entry, see Grant IAM users access to Kubernetes with EKS access entries and for details on IAM permissions needed for node role, see Amazon EKS Auto Mode node IAM role.
For more troubleshooting scenarios on EKS auto mode, check the following links.
How to debug Auto-Mode custom NodePool
How do I troubleshoot EKS Auto Mode built-in node pools with Unknown Status
- Topics
- ComputeContainers
- Language
- English
