How do I resolve insufficient capacity errors when I launch my SageMaker AI resources?

2 minute read
0

I get an "InsufficientCapacity" error when I try to launch an Amazon SageMaker AI training job, batch transform job, endpoint, Notebook instance, or SageMaker Studio app.

Resolution

When AWS doesn't have enough on-demand capacity to complete your request, you might get an InsufficientCapacity error that's similar to the following error messages:

"Unable to provision requested ML compute capacity due to InsufficientInstanceCapacity error. Please retry using a different ML instance type or after some time."

"An error occurred (InsufficientInstanceCapacity) when calling the StartInstances operation (reached max retries: 4): Insufficient capacity."

Amazon Elastic Compute Cloud (Amazon EC2) instance capacity isn't static. Instance capacity depends on the workloads in a specific AWS Region or Availability Zone. Insufficient capacity errors aren't related to resource quotas that AWS applies to your AWS account.

Capacity issues are transient and might resolve when you try your request again. If you can delay your request, then try your request at a later time.

To get immediate access to an instance, take one of the following actions:

  • Switch to a larger instance size in the same family, a different instance type, or use a different instance family based on your workload.
  • Launch the resource in a different Region or Availability Zone for the same instance type because each instance type has its own capacity. Verify what SageMaker instance types are available in each Region. 
    Note: To view instance type availability, on the On-demand pricing page, choose the tab for your SageMaker capability, and then select your Region from the Region dropdown list.
  • Submit a new instance request with a reduced number of instances.
  • Submit a new request, but don't specify an Availability Zone.
  • To reserve instances for your mission critical workloads, use on-demand Capacity Reservation. To create a Capacity Reservation, see Create a Capacity Reservation.

If you launch the SageMaker Studio app, then it's a best practice to configure the app with subnets that span multiple Availability Zones to minimize capacity issues.

If you launch a Notebook instance or training job, then select the same instance type with multiple subnets in different Availability Zones.

Related information

Insufficient instance capacity

Supported Regions and quotas

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago