AWS batch Fargate jobs stuck on runnable state

0

I have an AWS batch job queue, on Fargate_spot, that I'm running jobs on. It has been working without issue until this morning. Now, the jobs are all stuck at the RUNNABLE status. I have attempted to remove all pending jobs, and submit just a single job with minimal resource requirements, but this still stalls on RUNNABLE.

I have confirmed:

  • I am not currently exhausting my Fargate vCPU quotas
  • I have tried running the job with less resources, e.g. 2 vCPU's and 8GB memory.
  • The compute environment that they are running on, has a Valid status, and the state is Enabled.
  • The compute environment "Maximum vCPUs" is not maxed out.
  • No error logs in cloudwatch.

The AWS zone is Ohio.

What could be causing this issue? Is it possible all Fargate spot instances are currently in use? And if so, is there any way to detect this?

Thanks

Adam
질문됨 4달 전186회 조회
1개 답변
1
수락된 답변

The possible cause of the issue due to 1.AWS Fargate Spot instances are generally available, there can be times when spot capacity is limited or unavailable in specific regions 2.you might want to try running a small job on Fargate On-Demand to see if it executes as expected. 3.Double-check the service limits, not just for vCPUs, but also for other resources and limits related to AWS Batch and ECS 4.Verify that the necessary permissions and IAM roles are correctly configured.

profile picture
Jagan
답변함 4달 전
  • This was the issue. I ended up creating two compute environments. The first one uses spot provisioning, and the second uses on-demand. This way, when spot is not available, it falls back to the on-demand.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인