1 Answer
- Newest
- Most votes
- Most comments
0
This issue you're experiencing with EKS 1.31 where pods are stuck in pending status with a "CreateContainerError: failed to reserve container name" error is not a commonly reported problem, but it does seem to be related to container runtime issues, particularly with init containers.
To investigate and potentially resolve this issue, you can try the following approaches:
1. Check the container runtime logs: Look for relevant entries in the container runtime logs (containerd or Docker logs, depending on your setup) that might provide more insight into the name reservation conflict. You can use commands like 'journalctl' or the specific container runtime logging mechanism to access these logs.
2. Examine node resources: Ensure that your nodes have sufficient resources (CPU, memory, storage) to handle the container creation process. Resource constraints can sometimes lead to unexpected behavior in container orchestration.
3. Review your init container configurations: Since the issue seems to affect pods with init containers, carefully review the specifications of your init containers. Ensure there are no conflicts in naming or resource allocation.
4. Update EKS and node components: If possible, try updating to the latest EKS patch version within the 1.31 release. Also, ensure that your node's kubelet, container runtime, and other components are up to date.
5. Analyze kubelet logs: Check the kubelet logs on the affected nodes for any error messages or warnings related to container creation or name reservation.
6. Monitor cluster events: Use 'kubectl get events' to watch for any cluster-wide events that might be related to this issue.
7. Consider using different container runtime: If you're using containerd, you might want to try switching to Docker as the container runtime, or vice versa, to see if the issue persists.
8. Reach out to AWS support: If the issue continues after trying these steps, it may be worth contacting AWS support. They might have more specific insights or be aware of any known issues with EKS 1.31 related to this problem.
Remember to test any changes in a non-production environment first. Careful monitoring and logging will be key to understanding and resolving this issue.
Sources
failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config | AWS re:Post
EksAttemptContainerDetail - AWS Batch
Troubleshooting AWS Deep Learning Containers on EKS - AWS Deep Learning Containers
answered a year ago
Relevant content
- asked 2 years ago
- asked 9 months ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
