使用Sagemaker创建Huggingface模型错误,Sagemaker endpoint 一直处于“创建”状态。

0

【以下的问题经过翻译处理】 我试图使用sagemaker Python包的huggingface功能创建一个端点。

从sagemaker.huggingface导入HuggingFaceModel导入sagemaker

role= sagemaker.get_execution_role() Hub Model configuration. https://huggingface.co/models hub = { 'HF_MODEL_ID':'MY_ACC/MY_MODEL_NAME', 'HF_TASK':'text-generation' }

创建Hugging Face模型类 huggingface_model = HuggingFaceModel( transformers_version='4.6.1', pytorch_version='1.7.1', py_version='py36', env=hub, role=role, )

将模型部署到SageMaker Inference predictor = huggingface_model.deploy( initial_instance_count=1, # number of instances instance_type='ml.m5.xlarge' # ec2 instance type )

predictor.predict({ 'inputs': "Can you please let us know more details about your " })

MY_ACC / MY_MODEL_NAME是私有模型,因此端点创建一直输出以下错误:

This is an experimental beta features, which allows downloading model from the Hugging Face Hub on start up. It loads the model defined in the env var HF_MODEL_ID

Traceback (most recent call last): File "/usr/local/bin/dockerd-entrypoint.py", line 23, in <module> serving.main() File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 34, in main _start_mms() File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f return Retrying(dargs, dkw).call(f, args, kw) File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 206, in call return attempt.get(self._wrap_exception) File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 247, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "/opt/conda/lib/python3.6/site-packages/six.py", line 719, in reraise raise value File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 200, in call attempt = Attempt(fn(args, *kwargs), attempt_number, False) File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 30, in _start_mms mms_model_server.start_model_server(handler_service=HANDLER_SERVICE) File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/mms_model_server.py", line 75, in start_model_server use_auth_token=HF_API_TOKEN, File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 154, in _load_model_from_hub model_info = _api.model_info(repo_id=model_id, revision=revision, token=use_auth_token) File "/opt/conda/lib/python3.6/site-packages/huggingface_hub/hf_api.py", line 155, in model_info r.raise_for_status() File "/opt/conda/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/MY_ACC/MY_MODEL_NAME

看来有一个永无止境的循环来检查这个状态,我已经终止了开始创建端点的 Python 进程,但无论如何它都在继续。

我该如何解决这个问题?我只想删除端点,但是该选项显示为灰色,因为它仍处于创建阶段。

谢谢

profile picture
EXPERTE
gefragt vor 8 Monaten78 Aufrufe
1 Antwort
0

【以下的回答经过翻译处理】 这个是没有关系的,当创建一直运行时,这个过程最终会超时。希望对其他人有用。另外您还可以尝试使用AWS CLI 删除endpoint 。

profile picture
EXPERTE
beantwortet vor 8 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen