Issues Deploying Models from HuggingFace Hub via Sagemaker

0

I have recently made an AWS account, and stood up a sagemaker notebook instance. I wanted to deploy a foundation model from huggingface model hub to an endpoint. I followed the code directly from the HF hub:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'databricks/dolly-v2-7b',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': "Can you please let us know more details about your "
})

The above is for the dolly 2 model. No matter what model I try, though, I get a similar error after deployment. The deployment shows no errors, and the endpoint appears in my list of active endpoint. But when I query it with:

predictor.predict({
	'inputs': "Can you please let us know more details about your "
})

I always get an error similar to this:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027gpt_neox\u0027"
}

It seems to depend on the underlying model; dolly2 was trained on gpt-neox, and I get above error. If I used a fine-tuned version of a Meta Llama model, I get the same error but with the message: "\u0027llama\u0027". Help!

  • Can you check the endpoint logs to see if there is more detail? You can find the logs in the AWS console by going to SageMaker -> Endpoints (under the 'Inference' header) -> Name of your endpoint -> View logs.

  • I checked the logs but they arent very helpful, "2023-05-17T02:43:56,652 [INFO ] W-ehartford__WizardLM-7B-Un-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error"

asked a year ago758 views
1 Answer
0

The reason why your code is not running is probably because you are not deploying it using an Huggingface ECR container. Please refer to https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html for more details on various deployment options in Sagemaker.

For your particular problem, I would say that the easiest way for you is to use Sagemaker Jumpstart which already has pre-built code which you can reuse for your own use case. This code has link to the right Huggingface ECR container. For e.g. below is code for HuggingFace Flan-T5 model. https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/text2text-generation-flan-t5.ipynb

You can use Sagemaker Studio--> Deployments-->Sagemaker Jumpstart-->Models, Notebook, Solutions-->Select the right model you need. Then you can deploy this Jumpstart model, chose the right instance and you will have an end-point which you can use for inference.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions