AWS Batch spinning up ec2 spot instances but not running tasks on them (and tasks remain in runnable state)

0

Hello,

We are seeing jobs remaining stuck in RUNNABLE state in AWS Batch for several days. On checking the AWS console here are the observations:

  1. Compute environment is in INVALID state with reason: "CLIENT_ERROR - The instance IDs 'i-019be140af144fccf, i-01bd39c87ac0f87b7, i-034e8d61e0b7d5419, i-0ac46775085e4f9c4, i-0e129e96b5112c045' do not exist". Of these 5 instanceIDs I can see that 2 actually DO exist in the ec2 console while other 3 do not (perhaps they got reclaimed as this is a SPOT managed environment?)

  2. In the ECS cluster page I can see that the running EC2 spot instances are registered and visible under "Container Instances" and there are no alerts related to Agent version, etc (Agent version is 1.65.0)

  3. In the EC2 console I can see that several spot instances have been created and many of them have been running for 3+ days now.

My Queries:

  1. Can you please help me understand why this has happened? We have made ZERO changes to the compute environment and it was working correctly till Feb 1st.

  2. What can be done to fix this?

  3. Will I be charged for these instances that have been created by batch and have been sitting IDLE for over 3 days? A total of 16 instances had been created by the batch compute environment. I can share the instance ids if needed.

asked a year ago91 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions