How do I troubleshoot the "ResourceLimitExceeded" error in Amazon SageMaker?

2 minute read
0

I want to troubleshoot the "ResourceLimitExceeded" error in Amazon SageMaker.

Resolution

When you create a SageMaker resource, such as a processing job, training job, endpoint, or Studio app, you might get the ResourceLimitExceeded error. See the following example error message:

"The account-level service limit 'ml.m5.xlarge for endpoint usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit."

This error usually occurs when you exceed the account level service quotas that are specified for your SageMaker resources.

To resolve this error, complete the following steps:

1.    Open the Service Quotas console.
Note: To use the Service Quotas console, you need the corresponding AWS Identity and Access Management (IAM) permission in your user or role.

2.    From the AWS Region selector on the navigation bar, select the Region where you get the error.

3.    In the navigation pane, choose AWS services.

4.    In the Search bar, enter Amazon SageMaker.

5.    Choose Amazon SageMaker.

6.    Select the quota that you want to increase. For the preceding example error message, select ml.m5.xlarge for endpoint usage.

7.    Choose Request quote increase.

8.    For Change quota value, enter the desired value.

9.    Choose Request.

This sends your request to AWS Support. Based on your use case and current usage, your request is either approved, denied, or partially approved.


Related information

AWS service quotas

SageMaker service quotas

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago
11 Comments

I followed these steps for ml.g5.4xlarge for notebook instance usage. However, in step 6 I see that the quota is already at 1, contradicting the error message I get when trying to spin up the respective notebook instance.

How can I fix this?

replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied a year ago

Hello, I have a similar error however, I am not sure what service I should increase since the error message doesn't specify. Here is the error: "botocore.errorfactory.ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: Resource limits for this account have been exceeded. Please contact Customer Support for assistance."

NelioB
replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied a year ago

how can we catch "ResourceLimitExceeded" event to convert it into an CloudWatch Alarm?

Andres
replied a year ago

I have a similar issue and my Account quota is 4 for the selected instance type, but still getting the error that states it is 0

AWS
EXPERT
replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied a year ago

I have a similar issue: ResourceLimitExceeded

The account-level service limit 'ml.p3.2xlarge for training job usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.

My problem is I haven't used any resources yet. Seems a bit odd to request more.

Michael
replied 10 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 10 months ago

The above solutions seems like good options, but try this first:

  1. Delete the Sage Maker End Point (from the Sage Maker Dashboard)
  2. Restart the Kernel It should automatically recreate an End Point and reset the resource pool count..
replied 5 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 5 months ago