- Newest
- Most votes
- Most comments
This issue you're experiencing with EKS 1.31 where pods are stuck in pending status with a "CreateContainerError: failed to reserve container name" error is not a commonly reported problem, but it does seem to be related to container runtime issues, particularly with init containers.
To investigate and potentially resolve this issue, you can try the following approaches:
1. Check the container runtime logs: Look for relevant entries in the container runtime logs (containerd or Docker logs, depending on your setup) that might provide more insight into the name reservation conflict. You can use commands like 'journalctl' or the specific container runtime logging mechanism to access these logs.
2. Examine node resources: Ensure that your nodes have sufficient resources (CPU, memory, storage) to handle the container creation process. Resource constraints can sometimes lead to unexpected behavior in container orchestration.
3. Review your init container configurations: Since the issue seems to affect pods with init containers, carefully review the specifications of your init containers. Ensure there are no conflicts in naming or resource allocation.
4. Update EKS and node components: If possible, try updating to the latest EKS patch version within the 1.31 release. Also, ensure that your node's kubelet, container runtime, and other components are up to date.
5. Analyze kubelet logs: Check the kubelet logs on the affected nodes for any error messages or warnings related to container creation or name reservation.
6. Monitor cluster events: Use 'kubectl get events' to watch for any cluster-wide events that might be related to this issue.
7. Consider using different container runtime: If you're using containerd, you might want to try switching to Docker as the container runtime, or vice versa, to see if the issue persists.
8. Reach out to AWS support: If the issue continues after trying these steps, it may be worth contacting AWS support. They might have more specific insights or be aware of any known issues with EKS 1.31 related to this problem.
Remember to test any changes in a non-production environment first. Careful monitoring and logging will be key to understanding and resolving this issue.
Sources
failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config | AWS re:Post
EksAttemptContainerDetail - AWS Batch
Troubleshooting AWS Deep Learning Containers on EKS - AWS Deep Learning Containers
Troubleshooting steps since you didn't provide the logs :-
Check Pod Events and Logs:
Run the following command to check for events related to the pending pods: bash
kubectl describe pod <pod-name> -n <namespace>
Look for any relevant warnings or error messages in the output. Check Init Container Logs:
If the init container was able to start but is causing the issue during its shutdown, you can check its logs:
kubectl logs <pod-name> -c <init-container-name> -n <namespace>
Inspect Node Status:
Use the following command to check the status and resource utilization of your nodes:
kubectl get nodes -o wide
Ensure there are sufficient resources available on the nodes. Review Container Runtime Logs:
Depending on your setup (Amazon Linux 2023 or Bottlerocket), check the container runtime logs to gather insights into why the container could not be created. For example, if you're using containerd, check its logs:
journalctl -u containerd
Check for Orphaned Containers:
Investigate whether there are orphaned containers that haven't been cleaned up, which might be reserving names. You can do this on the node itself by using:
docker ps -a # If using Docker
crictl ps -a # If using containerd
Remove any orphaned or unused containers that might be causing conflicts. Resource Quotas and Limits:
Ensure that your Kubernetes resource quotas and limits are appropriately configured. Misconfigurations can lead to resource contention, especially with init containers. Tools to Help Investigate Kubernetes Dashboard: A UI tool to visualize pod statuses and events.
Prometheus and Grafana: For monitoring resource utilization and performance metrics.
CloudWatch Logs: If you have logging enabled for your EKS cluster, check CloudWatch for logs related to kubelet and container runtime errors.
kubectl-debug: A useful tool to debug containers directly within your Kubernetes cluster.
Fluentd or Fluent Bit: If you're using log aggregation, you can analyze logs from all your pods in a centralized location.
If you've gone through the investigation steps and still cannot resolve the issue, consider reaching out to AWS Support or the Kubernetes community for further assistance. It may also be beneficial to check for any updates or known issues with EKS 1.31 related to container initialization and naming.
Relevant content
- asked a year ago
- asked 2 years ago
- asked 4 years ago
- asked 4 years ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 2 years ago