Skip to content

How do I fix a compute environment that's not valid in AWS Batch?

5 minute read
0

My compute environment in AWS Batch is in the INVALID state, and I receive a “CLIENT_ERROR” error message.

Short description

In AWS Batch, you might receive a "CLIENT_ERROR" error message. When this occurs, AWS Batch moves your compute environment into the INVALID state.

Note: For AWS Batch on Amazon Elastic Kubernetes Service (Amazon EKS) compute environments, see INVALID compute environment.

Resolution

If your compute environment is in the INVALID state, then take the following troubleshooting actions based on the "CLIENT_ERROR" error message that you receive.

"CLIENT_ERROR - Not authorized to perform sts:AssumeRole" error

To resolve the "CLIENT_ERROR - Not authorized to perform sts:AssumeRole" error, fix the service role that's not valid. Complete the following steps:

  1. Open the AWS Batch console.
  2. In the navigation pane, choose Compute environments.
  3. Select the compute environment that's in the INVALID state.
    Note: If your compute environment is in the DISABLED state, then choose Enable to activate your compute environment.
  4. Choose Edit.
  5. For Service role, select a service role with permissions for AWS Batch to make calls to other AWS services.
    Note: Your service role manages the resources that you use with the service. Before you use the service, you must have an AWS Identity and Access Management (IAM) policy and role that provides the required permissions. If you don't have an IAM role with the required permissions, then create one.
  6. Choose Save.

"CLIENT_ERROR - Parameter: SpotFleetRequestConfig.IamFleetRole is invalid" error

If you use Amazon Elastic Compute Cloud (Amazon EC2) Spot Fleet instances, then you might receive the "CLIENT_ERROR - Parameter: SpotFleetRequestConfig.IamFleetRole is invalid" error message.

For managed compute environments that use Spot Fleet instances, create a role that grants the required permissions to use Spot Fleets. For general use, add the AmazonEC2SpotFleetTaggingRole AWS managed policy to the role.

Note: Use your new Spot Fleet role to create new compute environments. Existing compute environments can't change Spot Fleet roles. To remove the unneeded existing environments, deactivate the environment, and then delete it. For instructions, proceed to the next section.

"CLIENT_ERROR - The specified launch template...does not exist" error

If the launch template that you associated with your compute environment doesn't exist, then you receive the following error message:

"CLIENT_ERROR - The specified launch template, with template ID [###], does not exist"

To troubleshoot this issue, complete the following steps to deactivate and delete your compute environment:

  1. Open the AWS Batch console.
  2. In the navigation pane, choose Compute environments.
  3. Select the compute environment that's in the INVALID state. Then, choose Disable.
  4. Choose Delete.
  5. Create a new compute environment.

"CLIENT_ERROR - Your compute environment has been INVALIDATED and scaled down" error

When AWS Batch scales out a compute environment, the container instances join the Amazon Elastic Container Service (Amazon ECS) cluster. If the instances can't join the cluster, then AWS Batch marks the compute environment as INVALID. This occurs when the Amazon ECS agent on the instance doesn't call the RegisterContainerInstance API within an established time frame. In response, AWS Batch terminates the instance, and you receive the following error:

"CLIENT_ERROR - Your compute environment has been INVALIDATED and scaled down" because none of the instances joined the underlying ECS Cluster."

Your instances can't join an Amazon ECS cluster in the following scenarios:

  • Amazon Virtual Private Cloud (Amazon VPC) subnet configuration settings prevent successful communication with Amazon ECS.
  • An incorrect instance profile policy setting prevents authorization to Amazon ECS.
  • Custom Amazon Machine Images (AMIs) or launch template configurations affect the Amazon ECS agent.

To troubleshoot this issue, complete the following steps:

  1. Open the AWS Batch console.
  2. In the navigation pane, choose Compute environments.
  3. Select the compute environment that's in the INVALID state. Then, choose Disable.
  4. Select the same compute environment, and then choose Enable.

The preceding steps reactivate the compute environment and cause AWS Batch to mark the environment as VALID. To launch one instance and force a scale out, update the minVcpus parameter to 1. Then, use the running instance to troubleshoot the reason that instances can't join the Amazon ECS cluster.

"CLIENT_ERROR - Access denied" error

To resolve the "CLIENT_ERROR - Access denied" error, create a service role with the correct permissions. Or, choose an existing service role with the correct permissions.

"CLIENT_ERROR - Internal error" error

To resolve an "CLIENT_ERROR - Internal error" error, deactivate your compute environment. Then, reactivate it. For steps, see the "CLIENT_ERROR - Your compute environment has been INVALIDATED and scaled down" error section.

"CLIENT_ERROR - The request uses the same client token... non-identical request" error

To resolve the "CLIENT_ERROR - The request uses the same client token as previous, but non-identical request" error, deactivate your compute environment, and then activate it. For steps, see the "CLIENT_ERROR - Your compute environment has been INVALIDATED and scaled down" error section.

"CLIENT_ERROR - You are not authorized to use launch template" error

To resolve the "CLIENT_ERROR - You are not authorized to use launch template" error, take the following actions:

  • Confirm that your service role has the required permissions. Then, complete the steps in the "CLIENT_ERROR - Not authorized to perform sts:AssumeRole" error section.
  • Check whether your AWS account is part of AWS Organizations. If it is, then make sure that service control policies (SCPs) don't block access to your Amazon EC2 permissions.

To further troubleshoot IAM policy issues, see How can I troubleshoot access denied or unauthorized operation errors with an IAM policy?

Related information

Troubleshooting AWS Batch

Why is my AWS Batch job stuck in RUNNABLE status?

AWS OFFICIALUpdated a year ago