1 Answer
- Newest
- Most votes
- Most comments
1
- Go to service limit request quota page
- For Limit Type, Search and Select SageMaker Endpoints
- Make sure that region is same as the region that is displayed on the top right corner of your amazon console.
- Provide a description and detail about error message
- Click Submit
You'll get a response within 24-48 hours from the time you log support case for limit increase.
Note: Without looking further into more details for this error message in your account, it'd be little hard to find what's exactly blocking this, closest service limit listed in drop down is Sagemaker Endpoints and so I'm suggesting here.
Also, if you have support plan where you can create support case, I'd suggest you to log a support case under Technical category, as they would be in better position to tell exactly what's going on and what are those limits which are blocking this.
Hope you find this information helpful.
Comment here if you have additional questions, happy to help.
Abhishek
Relevant content
- asked 3 months ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
transformers_version='4.12.3' pytorch_version='1.9.1' py_version='py38' region = SESSION.boto_region_name
image_uri = sagemaker.image_uris.retrieve( framework='huggingface', base_framework_version=f'pytorch{pytorch_version}', region=region, version=transformers_version, py_version=py_version, instance_type='ml.m5.large', # No GPU support on serverless inference image_scope='inference' )
huggingface_model = HuggingFaceModel( model_data=model_s3_uri, # path to your trained sagemaker model role=role, # iam role with permissions to create an Endpoint
py_version="py38", # python version of the DLC
image_uri=image_uri, # sagemaker container image uri env={ "MMS_MAX_RESPONSE_SIZE": "20000000", "SAGEMAKER_CONTAINER_LOG_LEVEL": "20", "SAGEMAKER_PROGRAM": "inference.py", "SAGEMAKER_REGION": "us-east-1", "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code" } )
from sagemaker.serverless import ServerlessInferenceConfig serverless_config = ServerlessInferenceConfig( memory_size_in_mb=6019, max_concurrency=8, )
WHERE THE ERROR OCCURS:
predictor = huggingface_model.deploy( serverless_inference_config=serverless_config )