Issues Deploying Models from HuggingFace Hub via Sagemaker

0

I have recently made an AWS account, and stood up a sagemaker notebook instance. I wanted to deploy a foundation model from huggingface model hub to an endpoint. I followed the code directly from the HF hub:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'databricks/dolly-v2-7b',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': "Can you please let us know more details about your "
})

The above is for the dolly 2 model. No matter what model I try, though, I get a similar error after deployment. The deployment shows no errors, and the endpoint appears in my list of active endpoint. But when I query it with:

predictor.predict({
	'inputs': "Can you please let us know more details about your "
})

I always get an error similar to this:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027gpt_neox\u0027"
}

It seems to depend on the underlying model; dolly2 was trained on gpt-neox, and I get above error. If I used a fine-tuned version of a Meta Llama model, I get the same error but with the message: "\u0027llama\u0027". Help!

  • Can you check the endpoint logs to see if there is more detail? You can find the logs in the AWS console by going to SageMaker -> Endpoints (under the 'Inference' header) -> Name of your endpoint -> View logs.

  • I checked the logs but they arent very helpful, "2023-05-17T02:43:56,652 [INFO ] W-ehartford__WizardLM-7B-Un-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error"

1 Risposta
0

The reason why your code is not running is probably because you are not deploying it using an Huggingface ECR container. Please refer to https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html for more details on various deployment options in Sagemaker.

For your particular problem, I would say that the easiest way for you is to use Sagemaker Jumpstart which already has pre-built code which you can reuse for your own use case. This code has link to the right Huggingface ECR container. For e.g. below is code for HuggingFace Flan-T5 model. https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/text2text-generation-flan-t5.ipynb

You can use Sagemaker Studio--> Deployments-->Sagemaker Jumpstart-->Models, Notebook, Solutions-->Select the right model you need. Then you can deploy this Jumpstart model, chose the right instance and you will have an end-point which you can use for inference.

con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande