AWS Batch spinning up ec2 spot instances but not running tasks on them (and tasks remain in runnable state)

0

Hello,

We are seeing jobs remaining stuck in RUNNABLE state in AWS Batch for several days. On checking the AWS console here are the observations:

  1. Compute environment is in INVALID state with reason: "CLIENT_ERROR - The instance IDs 'i-019be140af144fccf, i-01bd39c87ac0f87b7, i-034e8d61e0b7d5419, i-0ac46775085e4f9c4, i-0e129e96b5112c045' do not exist". Of these 5 instanceIDs I can see that 2 actually DO exist in the ec2 console while other 3 do not (perhaps they got reclaimed as this is a SPOT managed environment?)

  2. In the ECS cluster page I can see that the running EC2 spot instances are registered and visible under "Container Instances" and there are no alerts related to Agent version, etc (Agent version is 1.65.0)

  3. In the EC2 console I can see that several spot instances have been created and many of them have been running for 3+ days now.

My Queries:

  1. Can you please help me understand why this has happened? We have made ZERO changes to the compute environment and it was working correctly till Feb 1st.

  2. What can be done to fix this?

  3. Will I be charged for these instances that have been created by batch and have been sitting IDLE for over 3 days? A total of 16 instances had been created by the batch compute environment. I can share the instance ids if needed.

已提問 1 年前檢視次數 96 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南