I want to troubleshoot an "InsufficientCapacity" error that I receive when I try to launch one of the following Amazon SageMaker AI resources: training job, batch transform job, processing job, endpoint, notebook instance, or SageMaker Studio app.
Resolution
When AWS doesn't have enough on-demand capacity to complete your request, you might get an InsufficientCapacity error that's similar to the following error messages:
"Unable to provision requested ML compute capacity due to InsufficientInstanceCapacity error. Please retry using a different ML instance type or after some time."
"An error occurred (InsufficientInstanceCapacity) when calling the StartInstances operation (reached max retries: 4): Insufficient capacity."
Amazon Elastic Compute Cloud (Amazon EC2) instance capacity isn't static. Instance capacity depends on the workloads in a specific AWS Region or Availability Zone. Insufficient capacity errors aren't related to resource quotas that AWS applies to your AWS account.
Capacity issues are transient and might resolve when you try your request again. If you can delay your request, then try your request at a later time.
To get immediate access to an instance, take one of the following actions:
- Switch to a larger instance size in the same family, a different instance type, or use a different instance family based on your workload.
- Launch the resource in a different Region or Availability Zone for the same instance type because each instance type has its own capacity. Verify what SageMaker AI instance types are available in each Region.
Note: To view instance type availability, on the On-demand pricing page, choose the tab for your SageMaker AI capability. Then, select your Region from the Region dropdown list.
- Submit a new instance request with a reduced number of instances.
- To reserve instances for your mission critical workloads, use on-demand Capacity Reservation. To create a Capacity Reservation, contact your AWS account manager.
If you launch the SageMaker Studio app, then configure the app with subnets that span multiple Availability Zones to minimize capacity issues.
If you launch a notebook instance or training job, then select the same instance type with multiple subnets in different Availability Zones.
Related information
Insufficient instance capacity
Supported Regions and quotas