How do I troubleshoot managed node group AMI update failures for Amazon EKS?
My Amazon Elastic Kubernetes Service (Amazon EKS) managed node group update to the newest Amazon Machine Image (AMI) version fails.
Resolution
When you initiate a managed node group update, Amazon EKS automatically updates your nodes. If you use an Amazon EKS optimized AMI, then Amazon EKS automatically applies the latest security patches and operating system (OS) updates to your nodes. During the node update process, you might receive one of the following errors. Choose a resolution based on the type of error that you receive.
Troubleshoot PodEvictionFailure errors
You receive the following error when a PodEvictionFailure blocks an upgrade:
"Reached max retries while trying to evict pods from nodes in node group"
If the Pods don't leave the node within 15 minutes and there's no --force flag, then the upgrade phase fails with a PodEvictionFailure error.
PodEvictionFailure errors occur when your Pod disruption budget (PDB) prevents Pod eviction, or your deployment tolerates all taints and keeps the node non-empty.
Choose a resolution based on the error that you receive.
Your PDB prevents Pod eviction
When multiple PDBs point to the same Pod, the Pod has a restrictive PDB configuration. A PDB specifies the number of disruptions that a class of Pods can tolerate at a given time. When a voluntary disruption causes Pods for a service to drop below the budget, the operation pauses until it can maintain the budget. The node drain event temporarily halts until more Pods become available. For more information, see Disruptions on the Kubernetes website.
To update the managed node group, either remove the PDBs temporarily or use the --force option. The --force option doesn't consider PDBs. Instead, it forces the nodes to restart and implement the updates.
Note: If the Pods protected by the PDB run a quorum-based application, the --force option might cause the application to become temporarily unavailable.
To confirm that you have PDBs configured in your cluster, run the following command:
kubectl get pdb --all-namespaces
Or, if you turned on audit logging in the Amazon EKS console, then complete the following steps:
-
Open the Amazon EKS console.
-
In the navigation pane, choose Clusters, and then choose your cluster.
-
Choose the Observability tab, and then choose Audit. The Amazon CloudWatch console opens.
-
In the CloudWatch console, choose Logs, and then choose Logs Insights.
-
In the Logs Insights Query editor, enter the following query:
fields @timestamp, @message | filter @logStream like "kube-apiserver-audit" | filter ispresent(requestURI) | filter objectRef.subresource = "eviction" | display @logStream, requestURI, responseObject.message | stats count(*) as retry by requestURI, responseObject.message -
Choose Run query.
-
Choose Custom to identify the date for the update. If there's a node group update failure because of a restrictive PDB, then responseObject.message describes the reason for the Pod eviction failure.
If your PDB caused the failure, then modify the PDB so that it allows at least one disruption. Then, update the managed node group again.
To edit the PDB, run the following command:
kubectl edit pdb PDB-NAME
Note: Replace PDB-NAME with the name of your PDB.
In the PDB spec, take one of the following actions:
- Lower minAvailable to a value less than your total replica count.
- Set maxUnavailable to at least 1.
If multiple PDBs target the same Pod, then remove or consolidate the duplicates so that only one PDB applies to each Pod.
Your deployment tolerates all taints
After the node drain process evicts all Pods, the node becomes empty because Amazon EKS tainted the node. If the deployment tolerates every taint, then the node remains non-empty and you see a Pod eviction failure.
To resolve this issue, review your deployment tolerations and remove unnecessary wildcard tolerations. Then, update the managed node group again. For more information, see Taints and tolerations on the Kubernetes website.
Troubleshoot launch template configuration issues
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
When a managed node group update fails, the failure reason is returned as an error code through the AWS Management Console or the describe-update AWS CLI command.
Choose a resolution based on the error that you receive.
Launch template doesn't match expected configuration
If you modify or delete the launch template associated with the node group, then the update can fail with errors related to the launch template configuration. These errors occur if you manually changed the launch template or if the the launch template version referenced by the node group no longer exists.
To resolve this issue, verify that the Auto Scaling group uses the launch template version that was originally associated with the node group. Also, roll back unintended changes to the launch template.
Amazon EKS can't find the launch template
If you delete the launch template referenced by the node group, then the update fails with an error that shows the launch template doesn't exist. The missing launch template prevents the Auto Scaling group from launching replacement nodes during the update process.
To resolve this issue, restore or recreate the launch template and associate the correct version with the node group. Then, update the managed node group again.
Launch template configuration isn't valid
If the launch template contains parameters that aren't valid, then the Auto Scaling group can't launch new Amazon Elastic Compute Cloud (Amazon EC2) instances. For example, parameters that aren't valid include incorrect user data, unsupported block device mappings, or incompatible configuration settings.
To resolve this issue, review the launch template configuration, and then correct settings that aren't valid. Then, update the managed node group again.
Troubleshoot AMI configuration issues
Choose a resolution based on the error that you receive.
Amazon EC2 can't find the AMI
If you delete the AMI specified in the launch template, then the Auto Scaling group can't launch new nodes during the update. You also can't launch a new node group if the AMI is in a different Region.
To resolve this issue, update the launch template to reference a valid AMI, and then update the managed node group again.
Incorrect AMI ID
If the AMI ID specified in the launch template is incorrect, then Amazon EC2 can't launch instances. An incorrect AMI ID prevents the update process from creating new worker nodes.
To resolve this issue, verify that the AMI ID is valid, and then update the launch template with a correct image. Then, update the managed node group again.
Troubleshoot node launch failures
Choose a resolution based on the error that you receive.
Specified instance type doesn't exist
If the launch template references an EC2 instance type that's not valid or supported, then instance launch requests fail during the update. Instance launch requests fail when the instance type is misspelled or not supported in the selected Region.
To resolve this issue, update the launch template to use a valid instance type, and then update the managed node group again.
Ec2InsufficientCapacity
If the requested instance type is temporarily unavailable in the selected Availability Zone, then the update fails with an insufficient capacity error.
To resolve this issue, use additional Availability Zones or different instance types, and then update the managed node group again.
InstanceLimitExceeded error
If your AWS account reaches its Amazon EC2 instance quota, then the Auto Scaling group can't launch replacement nodes during the update.
To resolve this issue, request a quota increase or reduce the number of running instances. Then, update the managed node group again.
AsgInstanceLaunchFailures error
If you delete or modify the Auto Scaling group that backs the node group outside of Amazon EKS, then the update fails because Amazon EKS can't manage node replacements.
To resolve this issue, validate your node group configuration and recreate the node group if the underlying Auto Scaling group no longer exists.
Auto Scaling group configuration isn't valid
If the Auto Scaling group configuration becomes inconsistent with the node group settings, then the update can fail. Manual modifications to the launch template, subnets, or scaling parameters outside of Amazon EKS can cause this inconsistency.
To resolve this issue, restore the Auto Scaling group configuration to match the expected node group configuration. Then, update the managed node group again.
Troubleshoot network configuration errors
Choose a resolution based on the error that you receive.
Subnet configuration isn't valid
If the subnets associated with the node group aren't configured correctly or can't route traffic properly, then new nodes can't join the cluster. Private subnets that lack a NAT gateway or missing routing rules can cause this failure.
To resolve this issue, make sure that your subnet routing and connectivity allow nodes to communicate with the cluster control plane.
Specified security group doesn't exist
If you delete a security group referenced by the launch template or node group configuration, then you can't launch new nodes.
To resolve this issue, update the configuration to reference a valid security group, and then update the managed node group again.
Security group configuration isn't valid
If the associated security groups contain rules that are in conflict or aren't valid, then new nodes fail to launch or connect to the cluster.
To resolve this issue, verify that the security groups allow the required inbound and outbound communication between worker nodes and the cluster control plane.
InsufficientFreeAddresses error
If the subnet associated with the node group runs out of available IP addresses, then the Auto Scaling group can't launch additional nodes during the update.
To resolve this issue, expand the subnet CIDR range or add additional subnets to provide more available IP addresses.
Troubleshoot node registration failures
Choose a resolution based on the error that you receive.
New nodes can't join the cluster
If you successfully launch instances but the instances fail to register with the Kubernetes cluster, then the update fails with a node creation error. The failed update occurs when the node AWS Identity and Access Management (IAM) role lacks required permissions, the bootstrap script fails, or the aws-auth ConfigMap doesn't include the node role.
To resolve this issue, validate your node's IAM policies, bootstrap configuration, and cluster authentication settings. For access entries, make sure that you specify the node role with the EC2 permissions. For more information, see Amazon EKS node IAM role.
Access denied because of insufficient IAM permissions
If the node IAM role doesn't include the required permissions for worker nodes, then the nodes can't communicate with required AWS services.
To resolve this issue, make sure that the node IAM role includes the required IAM policies, such as AmazonEKSWorkerNodePolicy and AmazonEC2ContainerRegistryReadOnly.
Troubleshoot cluster unreachable errors
Choose a resolution based on the error that you receive.
Worker nodes can't reach the cluster endpoint
If the worker nodes can't connect to the Kubernetes API server, then the update fails with a cluster unreachable error. Security group restrictions, missing NAT gateway access for private nodes, or an incorrect endpoint configuration can prevent worker nodes from connecting to the Kubernetes API server.
To resolve this issue, verify network connectivity between worker nodes and the cluster endpoint.
Resolve update state errors
Another update is already in progress error
If an update operation is already running on the node group, then Amazon EKS rejects additional update requests until the current operation completes.
To resolve this issue, wait for the existing update to complete, and then update the managed node group again.
Request parameters aren't valid
If the update request contains parameters that aren't valid, such as an unsupported version or incorrect configuration values, then the update fails. You receive an error message for parameters that aren't valid.
To resolve this issue, review the update request and make sure that the node group configuration supports all parameters. Then, update the managed node group again.
Amazon EKS can't find the node group
If the node group name provided in the update request doesn't exist, then the update operation fails.
To resolve this issue, make sure that you specify the correct cluster name and node group name. Then, update the managed node group again.
UnsupportedKubernetesVersion error
If you update the node group to a Kubernetes version that the cluster control plane doesn't support, then the update fails.
To resolve this issue, make sure that the node group version is compatible with the current cluster. Then, update the managed node group again.
KubernetesVersionMismatch error
Amazon EKS doesn't allow node groups to run a Kubernetes version that's newer than the control plane. If an update tries to upgrade the node group beyond the cluster version, then the operation fails.
To resolve this issue, upgrade the control plane, and then update the node group.
Node group became unhealthy during the update
If repeated instance launch failures or node readiness issues occur during the rolling update process, then Amazon EKS marks the node group as unhealthy. Then, Amazon EKS stops the update process.
To resolve this issue, investigate instance launch errors, node initialization logs, and network connectivity. Then, update the managed node group again.
Related information
- Topics
- Containers
- Language
- English

Any guideline for below error messages?
NodeCreationFailure
Couldn't proceed with upgrade process as new nodes are not joining node group
Thank you for your comment. We'll review and update the Knowledge Center article as needed.
This article was reviewed and updated on 2026-05-21.
Relevant content
- asked 3 years ago
- Accepted Answerasked 4 years ago