Why is my Amazon EKS Pod stuck in the ContainerCreating state with the "Failed to create pod sandbox" error?
My Amazon Elastic Kubernetes Service (Amazon EKS) Pod is stuck in the ContainerCreating state with the "failed to create pod sandbox" error.
Resolution
Prerequisites
Determine what Pod has an issue. Complete the following steps:
-
Run the following command to list the Pods in your cluster to identify Pods in the ContainerCreating state:
kubectl get pods --all-namespaces -o wide -
Run the following command to retrieve the details about each Pod in the ContainerCreating state:
kubectl describe pod pod-name -n pod-namespaceNote: Replace pod-name with the name of your Pod and pod-namespace with the namespace where your Pod is located.
-
Review the output in the Events to identify the Pods, and then use the following sections to troubleshoot your issue.
"Resource temporarily unavailable" error
If your defined kernel settings for PID or files exceed the maximum limit for the operating system (OS), then you get an error message that's similar to the following:
"kubelet, ip-192-168-0-1.us-east-1.compute.internal Failed to create Pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for Pod "example_pod": Error response from daemon: failed to start shim: fork/exec /usr/bin/containerd-shim: resource temporarily unavailable: unknown"
To temporarily resolve the issue, restart the node.
To troubleshoot the issue, complete the following steps:
-
Gather the node logs for Containerd and the Kubelet.
For Windows, connect to your instance. Open a PowerShell command prompt, and then use the Windows EKS log collector script to collect the worker node logs. For more information, see EKS Logs Collector (Windows) on the GitHub website. Run the following command:Invoke-WebRequest -OutFile eks-log-collector.ps1 https://raw.githubusercontent.com/awslabs/amazon-eks-ami/main/log-collector-script/windows/eks-log-collector.ps1 .\eks-log-collector.ps1For Linux, connect to your instance. Then, use the Linux EKS log collector script to collect the worker node logs. For more information, see EKS Logs Collector (Linux) on the GitHub website. Run the following command to download the log collector script:
curl -O https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/log-collector-script/linux/eks-log-collector.sh -
Then, run the downloaded script:
sudo bash eks-log-collector.sh -
Review the Kubelet log for the following error responses:
"kubelet[5267]: runtime: failed to create new OS thread (have 2 already; errno=11)""kubelet[5267]: runtime: may need to increase max user processes (ulimit -u)" -
Identify the zombie processes, and then stop the ones that aren't necessary.
For Windows, open Task Manager, and then choose the Details tab. Check for processes that show the Not responding status to identify the zombie processes.
For Linux, run the following ps command to check for zombie processes that are listed in the Z state:ps aux | egrep "Z|defunct"For more information, see How to kill Zombie processes on Linux on the Linux Journal website.
"Network plugin cni failed to set up pod network" error
If the Container Network Interface (CNI) can't assign an IP address to your newly created Pod, you might receive the following error message:
"Network plugin cni failed to set up pod network: add cmd: failed to assign an IP address to container"
This error can occur for several reasons, primarily falling into three categories:
Resource limitations:
- Exhausted subnet IPs
- Maximum ENI attachment limit reached
Configuration issues:
- Missing IAM CNI policy on worker nodes
- VPC CNI (aws-node) Pod not in Running status
- Outdated versions of VPC CNI, kube-proxy, or CoreDNS
- Misconfigured security groups or access control lists
Architecture-specific challenges:
- VPC CNI configuration misalignment with your use case
- Missing OpenID Connect (OIDC) configuration
- Self-managed add-ons instead of EKS-managed add-ons
For detailed troubleshooting steps see How do I resolve kubelet or CNI plugin issues for Amazon EKS?
It's a best practice to do the following:
- Use custom networking for your VPC CNI that allows you deploy Pods on alternate subnets.
- Turn on prefix delegation mode. For more information, see Prefix mode for Windows.
- Use a security group for Pods to assign custom security groups to multiple Pods through branch ENIs on a worker node.
Note: If you make any changes to your CNI configuration, you must restart the affected nodes for the changes to take effect.
"Error while dialing" error
If the aws-node Pod failed to communicate with IPAM because the aws-node Pod failed to run on the node, then you get an error similar to the following:
"Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"
The error occurs in these scenarios:
VPC CNI in Pending state
Liveness and Readiness probe failures can occur due to security rule configurations or application errors. Resource exhaustion can also cause delays. Normally, the failures might occur if DISABLE_TCP_EARLY_DEMUX is set to false with POD_SECURITY_GROUP_ENFORCING_MODE in strict mode.
If you use security groups per Pods and liveness or readiness probes, set DISABLE_TCP_EARLY_DEMUX to true in strict mode. This enables the kubelet to use TCP to connect to Pods on branch network interfaces.
CNI managed plugin issues
The aws-node fails probes when added as a managed plugin in the AWS Management Console. This happens because managed plugins overwrite the service account.
To resolve this issue, do one of the following:
- Remove the managed add-on from the AWS Management Console. Then, recreate it with the correct IAM role that provides the required AmazonEKS_CNI_Policy IAM policy.
- Edit the existing aws-node service account to associate it with the correct IAM role that has the required CNI permissions.
- If you use Pod Identity on the cluster, create the necessary Pod identity association that binds the aws-node service account with the IAM role to be used by the VPC CNI.
Additional recommendations:
- Use the latest versions of VPC CNI (aws-node), kube-proxy, and coredns.
"Failed to setup network for sandbox" error
If you use Enable Prefix delegation in your VPC-CNI (aws-node) Pods, you might receive the following error message:
"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox"
To get more details about this issue, check the IP Address Management Daemon (IPAMD) logs at /var/log/aws-routed-eni/ipamd.log in your worker node.
Even if your subnet has available IP addresses, if the subnet doesn't have any contiguous /28 blocks of IPs available, an error occurs. The error occurs when the fragmentation of existing secondary IP addresses spreads out across a subnet. The error appears in the Amazon VPC CNI plugin for Kubernetes logs or in your CloudTrail events for the affected worker node. You might receive the following error message:
"InsufficientCidrBlocks: The specified subnet does not have enough free cidr blocks to satisfy the request"
To resolve this error, complete one of the following options:
Create a new subnet, and then launch the Pods there.
-or-
Use an Amazon EC2 subnet CIDR reservation to reserve space within a subnet for use with a prefix assignment.
If you transition from IP address assignment to IP prefix assignment, then create new node groups. Then, you can increase the number of available IP addresses rather than do a rolling replacement of your existing nodes.
If you run Pods on a node that has both IP addresses and prefixes assigned, then it can cause inconsistency in the advertised IP address capacity. This scenario can impact the future workloads on the node. To solve this issue, follow the steps in Assign more IP addresses to Amazon EKS nodes with prefixes.
"Container image authentication" error
If the container runtime doesn't have the necessary permissions to pull an image, then you might get an error message that's similar to the following:
"pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials"
To resolve this error, check the following:
- Review the IAM role attached to the EKS worker node. If it is missing the IAM Policy AmazonEC2ContainerRegistryReadOnly, then attach the policy.
- If you pull images from an ECR repo from a cross-account, then see How do I allow a secondary account to push or pull images in my Amazon ECR image repositories? Make sure that the repository-level policy is set with the proper permissions.
- If you use ECR VPC endpoints to pull images. Then, make sure the VPC endpoint security group allows inbound HTTPS on port 443 from the EKS node security group. For more information, see Amazon ECR interface VPC endpoints (AWS PrivateLink).
- For containerd, pause image pull is only done during bootstrap. The image must be retained on the instance after that. Don't use custom cleanup scripts or manual deletion commands like crictl rmi —prune to remove the pause images. Let kubelet handle garbage collection (gc), which automatically runs every 1-2 minutes. When you remove containerd pause container images, new pods fail to start. Current pods run without any issues on the nodes and kubelet logs show error "Failed to create pod sandbox".
If you deleted the pause container image from your EKS worker node, then complete the following steps:
-
To check if your EKS worker node has pause images, run the following command:
ctr -n k8s.io images ls | grep -o "602401143452.dkr.ecr.YOUR-REGION.amazonaws.com/eks/pause:3.10"Note: Replace YOUR-REGION with your AWS Region.
If the EKS worker node has pause images, the output would look like the following:
602401143452.dkr.ecr.aws.example.region.amazonaws.com/eks/pause:3.10 602401143452.dkr.ecr.aws.example.region.amazonaws.com/eks/pause:3.10To check if your EKS worker node is deleted, run the following command:
ctr -n k8s.io images ls | grep -o "localhost/kubernetes/pause:latest"If the images are deleted, the output would look like the following:
ctr -n k8s.io images list |grep -i pause -
To pull the image and get the ECR authentication token, run the following commands:
ECR_REGION=YOUR-REGION ECR_PASSWORD=$(aws ecr get-login-password --region $ECR_REGION)Note: Replace YOUR-REGION with your AWS Region.
-
Pull the pause image from ECR. Make sure that you have the correct pause image version. If you are not sure of the image version, then log in to any of your work nodes and pick the image version.
For example:
sudo ctr -n k8s.io image pull --user "AWS:$ECR_PASSWORD" 602401143452.dkr.ecr.aws.example.region.amazonaws.com/eks/pause:<ImageVersion>Note: You can select the latest image version from Releases on the GitHub website.
-
Tag the image as localhost for EKS.
For example:
sudo ctr -n k8s.io image tag 602401143452.dkr.ecr.aws.example.region.amazonaws.com/eks/pause:<ImageVersion> localhost/kubernetes/pause:latest -
After the pause image is back up, the new pods are up and runs without any issues.
"Container image authentication" error
If the container runtime doesn't have the necessary permissions to pull an image, then you might get an error message that's similar to the following:
"pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials"
To resolve this error, check the following:
- Review the IAM role attached to the EKS worker node. If it is missing the IAM Policy AmazonEC2ContainerRegistryReadOnly, attach that policy.
- If you pull images from an ECR repo from a cross-account, then see How do I allow a secondary account to push or pull images in my Amazon ECR image repositories? Make sure that the repository-level policy is set with the proper permissions.
- If you use ECR VPC endpoints to pull images, make sure the VPC endpoint security group allows inbound HTTPS on port 443 from the EKS node security group. For more information, see Amazon ECR interface VPC endpoints (AWS PrivateLink).
- For containerd, pause image pull is only done during bootstrap. The image must be retained on the instance after that. Don't use custom cleanup scripts or manual deletion commands like crictl rmi —prune to remove the pause images. Let kubelet handle garbage collection (gc), which automatically runs every 1-2 minutes. When you remove containerd pause container images, new pods fail to start. Current pods run without any issues on the nodes and kubelet logs show error "Failed to create pod sandbox".
"Pod does not have label" error on Windows nodes
If a Pod doesn't have a scheduled nodeSelector on a Windows node, then you might get an error message that's similar to the following:
"Failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address" or "Pod does not have label vpc.amazonaws.com/PrivateIPv4Address"
To resolve the issue, make sure to include the following labels in the PodSpec for the nodeSelector parameter:
nodeSelector: kubernetes.io/os: windows kubernetes.io/arch: amd64
Confirm that you set the enable-windows-ipam parameter to true in your amazon-vpc-cni configmap.
If you don't have an amazon-vpc-cni configmap, then use the following template and upload to your cluster:
apiVersion: v1 kind: ConfigMap metadata: name: amazon-vpc-cni namespace: kube-system data: enable-windows-ipam: "true"
Restart the aws-node and node-windows Pods.
For more information about how to deploy Windows nodes on Amazon EKS clusters, see Deploy Windows nodes on EKS clusters.
Security group error
If you have a security group issue, then you get an error similar to the following:
"Plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
Vpc-resource-controller failed to allocate branch ENI to pod: creating network interface, NoCredentialProviders: no valid providers in chain. Deprecated."
This error response can indicate an issue with the health.kubernetes control plane. To resolve this issue, contact AWS Support.
Related information
How do I resolve kubelet or CNI plugin issues for Amazon EKS?
How do I troubleshoot an OIDC provider and IRSA in Amazon EKS?
How do I troubleshoot IRSA errors in Amazon EKS?
- Topics
- Containers
- Language
- English
Related videos


This issue "failed to assign an IP address to container" can be also related to the usage of an old version of the CNI (~1.13) with EKS 1.27, you might need to update that version to the lastest one to solve the issue
Thank you for your comment. We'll review and update the Knowledge Center article as needed.
after we deployed a new version of a pod on EKS pods are stuck on ContainerCreating state
running kubectl describe pod <pod> on a problematic pod we see
Warning FailedCreatePodSandBox 17m kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "container": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
looking at the subnetes of the cluster we plenty of ips available cdr of /16 to the cluster please help
Thank you for your comment. We'll review and update the Knowledge Center article as needed.
Relevant content
- asked 2 years ago
- Accepted Answerasked a year ago
- asked 2 years ago
- asked 4 years ago
AWS OFFICIALUpdated 7 months ago