SageMaker inference endpoint with HuggingFaceModel ignores custom inference.py script

0

Hello, I'm trying do deploy a HuggingFaceModel using sagemaker inference endpoint. I've been following some guides, e.g.: this one and this. My model of choice is Llama-2 fine-tuned on my own data. I've packed it and created a model.tar.gz which contains the following structure:

model.tar.gz/
├── config.json
├── generation_config.json
├── tokenizer.json
├── pytorch_model-00001-of-00003.bin
├── ... (other model files)
└── code/
  ├── inference.py
  └── requirements.txt

My inference.py script defines the functions model_fn and output_fn, with custom model loading and output parsing logic.

I've uploaded this model.tar.gz to the S3 bucket in model_s3_path.

During the sagemaker endpoint creation, I define my HuggingFaceModel as follows:

from sagemaker.huggingface import get_huggingface_llm_image_uri
from sagemaker.huggingface import HuggingFaceModel

llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.9.3"
)

huggingface_model = HuggingFaceModel(
    model_data=model_s3_path,
    role=aws_role, 
    image_uri=llm_image,
    env={
      'HF_MODEL_ID': 'meta-llama/Llama-2-7b-hf',
      'SM_NUM_GPUS': '1',
      'MAX_INPUT_LENGTH': '2048',
      'MAX_TOTAL_TOKENS': '4096',
      'MAX_BATCH_TOTAL_TOKENS': '8192', 
      'HUGGING_FACE_HUB_TOKEN': "<my-hf-token>"
    }
)

And then I deploy the model:

huggingface_model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    endpoint_name=endpoint_name,
    container_startup_health_check_timeout=300
)

However, during the inference, the resulting endpoint model doesn't seem to use any of the functionality from inference.py, but rather sticks to all default methods. For instance, it still returns response as [{"generated_texts": model_response}] although my post-processing function (output_fn) should've changed the return type.

  1. I've tried setting entry_point="inference.py" and source_dir="./code" during the HF model creation - the endpoint was not deploying at all.
  2. Used env variable "SAGEMAKER_PROGRAM": "inference.py" - did not change the model's responses, functionality from inference.py still was ignored.
  3. Tried various image_uri - did not change the endpoint's behaviour.
Vlad
demandé il y a 8 mois378 vues
1 réponse
1
Réponse acceptée

Hello Vlad,

Thank you for using AWS SageMaker.

I understand that you are trying to built a custom endpoint which will serve to your request with the help of the model that was trained outside SageMaker. The blogs that are used as reference are 3rd party blog so I won't be able to check internally if they have any code fix required, but to better investigate the issue, we would like to know more details about the endpoint configuration and certain backend details along with CloudWatch logs which will help us understand what could be missing and how to fix the issue. As this medium is not secured to share all those details and without that it will be difficult to narrow down the issue, so I request you please create a case with AWS Support so that the available engineers can assist you better to achieving the desired result.

To open a support case with AWS use the link: https://console.aws.amazon.com/support/home?#/case/create

AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
répondu il y a 8 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions