Can someone help me load my model to create an endpoint?
Provided explanation of steps followed, error logs and code used to create everything...thank you in advance.
I'm trying very hard to infer the ''Pygmalion/pygmalion-6b'' model from HuggingFace.
But it gives an error when trying to load the worker, it seems unable to. I'm pretty sure I selected an instance big enough, so it shouldn't be because of an OOM exception (I think...)
ERROR LOGS:
2023-06-27T00:20:24.851+02:00 2023-06-26T22:20:24,738 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died. 776125764525:/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2023-06-26-22-10-51-934 Link
2023-06-26T22:20:24,738 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died.
2023-06-27T00:21:19.999+02:00 2023-06-26T22:21:16,306 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] 776125764525:/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2023-06-26-22-10-51-934 Link
2023-06-26T22:21:16,306 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
STEPS FOLLOWED:
I creating the inference endpoint through a notebook and the sagemaker python sdk.
I've taken the model from the following huggingFace url: https://huggingface.co/PygmalionAI/pygmalion-6b, downloaded it, removed the ''runs'' folder (which to my knowledge contain training logs), and compressed the model, uploaded to s3 and followed guides on how to use python sdk.
If i try to load the model using the HF_ID directly (PygmalionAI/pygmalion-6b) it won't work. (not providing python code because of input msg length restriction.
Here's my python code:
pip install sagemaker --upgrade
pip install transformers --upgrade
import sagemaker
import boto3
sess = sagemaker.Session()
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
sagemaker_session_bucket = sess.default_bucket()
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
print(f"sagemaker role arn: {role}")
print(f"sagemaker session region: {sess.boto_region_name}")
from sagemaker.huggingface import get_huggingface_llm_image_uri
llm_image = get_huggingface_llm_image_uri(
"huggingface",
version="0.8.2"
)
print(f"llm image uri: {llm_image}")
import json
from sagemaker.huggingface import HuggingFaceModel
instance_type = "ml.g4dn.2xlarge"
number_of_gpu = 1
health_check_timeout = 750
config = {
'SM_NUM_GPUS': json.dumps(number_of_gpu),
'MAX_INPUT_LENGTH': json.dumps(1024),
'MAX_TOTAL_TOKENS': json.dumps(2048),
}
llm_model = HuggingFaceModel(
model_data='s3://pygmalion-6b-s3/model.tar.gz',
role=role,
transformers_version="4.26",
pytorch_version="1.13",
py_version='py39',
env=config,
)
llm = llm_model.deploy(
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)
data = {
"inputs": "hello, how are you?"
}
python script based on: https://huggingface.co/blog/sagemaker-huggingface-llm but also tried with this version: https://huggingface.co/docs/sagemaker/inference#create-a-model-artifact-for-deployment and was also unable to load the worker.. TT