AWS batch Fargate jobs stuck on runnable state

Question

I have an AWS batch job queue, on Fargate_spot, that I'm running jobs on. It has been working without issue until this morning.
Now, the jobs are all stuck at the RUNNABLE status. I have attempted to remove all pending jobs, and submit just a single job with minimal resource requirements, but this still stalls on RUNNABLE.

I have confirmed:

- I am not currently exhausting my Fargate vCPU quotas
- I have tried running the job with less resources, e.g. 2 vCPU's and 8GB memory.
- The compute environment that they are running on, has a Valid status, and the state is Enabled.
- The compute environment "Maximum vCPUs" is not maxed out.
- No error logs in cloudwatch.

The AWS zone is Ohio.

What could be causing this issue? Is it possible all Fargate spot instances are currently in use? And if so, is there any way to detect this?

Thanks

Accepted Answer

The possible cause of the issue due to 
1.AWS Fargate Spot instances are generally available, there can be times when spot capacity is limited or unavailable in specific regions
2.you might want to try running a small job on Fargate On-Demand to see if it executes as expected.
3.Double-check the service limits, not just for vCPUs, but also for other resources and limits related to AWS Batch and ECS
4.Verify that the necessary permissions and IAM roles are correctly configured.

AWS batch Fargate jobs stuck on runnable state

관련 콘텐츠