Create Sagemaker endpoint that supports Inference components

0

I am attempting to create a sagemaker endpoint to host my own model (custom container) through the CLI (in a bash script). I've verified that the model works as expected when setting things up through the aws console.

The steps I'm taking to do this are the following:

  1. Create a model from a container:
aws sagemaker create-model 
--model-name "MY-MODEL" 
--primary-container "$(cat << EOF
    {
        "Image": "MY-IMG"
    }
EOF
)"  --execution-role-arn "MY-EXECUTION-ROL" \
  1. Create an endpoint config:
aws sagemaker create-endpoint-config 
--endpoint-config-name "MY-ENDPOINT-CONFIG"  
--production-variants "$(cat << EOF
[{
    "VariantName": "MY-VARIANT",
    "ModelName":  "MY-MODEL",
    "InitialInstanceCount": 1,
    "InstanceType": "MY-INSTANCE-TYPE"
    }
]
EOF
)" 
  1. Create an endpoint:
aws sagemaker create-endpoint 
--endpoint-name "MY-ENDPOINT" 
--endpoint-config-name "MY-ENDPOINT-CONFIG" 
  1. Create an inference component:
aws sagemaker create-inference-component 
--inference-component-name "MY-COMPONENT" \
--endpoint-name "MY-ENDPOINT" 
--variant-name "MY-VARIANT" 
--specification "$(cat << EOF
    {
        "ComputeResourceRequirements": {
            "NumberOfAcceleratorDevicesRequired": 1.0,
            "MinMemoryRequiredInMb": 128
        }
    }
EOF
)" \
--runtime-config "{\"CopyCount\": 1}" 

Everything up to step 4 works, but when trying to create an inference component, the result is:

An error occurred (ValidationException) when calling the CreateInferenceComponent operation: Inference Components are not supported in this Endpoint. Please make sure this endpoint can deploy inference components.

After hours of googling I'm at a loss: there is no explanation anywhere on how to create endpoints that DO support inference components. using the describe command on existing endpoints created through the UI doesn't show anything specific to this end.

Karel H
asked 21 days ago309 views
1 Answer
0
Accepted Answer

I have checked into the code from the GitHub repo [1] and compared the code against the one you provided. In your code, you create a model then deploy this model into an Endpoint. Thereafter, you try to add another model to the Endpoint using inference components. However, in the GitHub repo, they launch an Endpoint without a model (an empty Endpoint ) and then they create the model after then add this model to the Endpoint as inference components.

The reason behind the error you were getting is because when you deployed your Endpoint, you already added a model to the Endpoint which occupies all the CPU/GPUs on your instance, thus leaving no room for another model all CPU/GPUs are occupied.

You need to deploy an empty Endpoint as per this GitHub repo [1] and then add the models as inference components on your instance, specifying how the much CPU/GPUs the model can occupy.

You can also use the provided GitHub repo as an guideline and also this blog [2].

[1] https://github.com/aws/amazon-sagemaker-examples/tree/main/inference/generativeai/llm-workshop/lab-inference-components-with-scaling

[2] https://aws.amazon.com/blogs/aws/amazon-sagemaker-adds-new-inference-capabilities-to-help-reduce-foundation-model-deployment-costs-and-latency/

AWS
answered 21 days ago
AWS
EXPERT
Alex_T
reviewed 3 days ago
  • Thanks! this was indeed the problem :)

    Don't know who's in charge of documenting this, but adding this info to the CLI documentation page would be quite helpful too. It currently makes no mention of the distinction between inference component enabled endpoints or otherwise.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions