Issues Deploying Models from HuggingFace Hub via Sagemaker

Question

I have recently made an AWS account, and stood up a sagemaker notebook instance. I wanted to deploy a foundation model from huggingface model hub to an endpoint. I followed the code directly from the HF hub:

```
from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'databricks/dolly-v2-7b',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': "Can you please let us know more details about your "
})
```
The above is for the dolly 2 model. No matter what model I try, though, I get a similar error after deployment. The deployment shows no errors, and the endpoint appears in my list of active endpoint. But when I query it with:

```
predictor.predict({
	'inputs': "Can you please let us know more details about your "
})
```

I always get an error similar to this:

```
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027gpt_neox\u0027"
}
```
It seems to depend on the underlying model; dolly2 was trained on gpt-neox, and I get above error. If I used a fine-tuned version of a Meta Llama model, I get the same error but with the message: "\u0027llama\u0027". Help!

Answer

The reason why your code is not running is probably because you are not deploying it using an Huggingface ECR container.  Please refer to https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html for more details on various deployment options in Sagemaker.

For your particular problem, I would say that the easiest way for you is to use Sagemaker Jumpstart which already has pre-built code which you can reuse for your own use case. This code has link to the right Huggingface ECR container. For e.g. below is code for HuggingFace Flan-T5 model.
https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/text2text-generation-flan-t5.ipynb

You can use Sagemaker Studio--> Deployments-->Sagemaker Jumpstart-->Models, Notebook, Solutions-->Select the right model you need.  Then you can deploy this Jumpstart model, chose the right instance and you will have an end-point which you can use for inference.

Issues Deploying Models from HuggingFace Hub via Sagemaker

Relevant content