HuggingFaceModel with fully local mode is still trying to access AWS API
Hello,
I am trying to test a HuggingFaceModel in local mode with Sagemaker. I would like to deploy the HF model for inference into my local Docker environment. I have the following code:
from sagemaker.huggingface import HuggingFaceModel huggingface_model = HuggingFaceModel( model_data="file:///path/to/mymodel.tar.gz", # path to your trained SageMaker model role='SageMakerRole', # IAM role with permissions to create an endpoint transformers_version="4.26", # Transformers version used pytorch_version="1.13", # PyTorch version used py_version='py39', ) huggingface_model.deploy( initial_instance_count=1, instance_type='local' )
I have the following configuration set in ~/.sagemaker/config.yaml
.
local: local_code: true region_name: "us-west-2" container_config: shm_size: "128M"
It's my understanding that this should be enough to invoke local mode, but when running the code I get the following trace:
---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
/Users/jacob.windle/Projects/sagemaker_local_testing/TestLocalMode.ipynb Cell 3 line 9
1 huggingface_model = HuggingFaceModel(
2 model_data="file:///path/to/my/model.tar.gz", # path to your trained SageMaker model
3 role='SageMakerRole', # IAM role with permissions to create an endpoint
(...)
6 py_version='py39',
7 )
----> 9 huggingface_model.deploy(
10 initial_instance_count=1,
11 instance_type='local'
12 )
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/huggingface/model.py:315, in HuggingFaceModel.deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, volume_size, model_data_download_timeout, container_startup_health_check_timeout, inference_recommendation_id, explainer_config, **kwargs)
308 inference_tool = "neuron" if instance_type.startswith("ml.inf1") else "neuronx"
309 self.image_uri = self.serving_image_uri(
310 region_name=self.sagemaker_session.boto_session.region_name,
311 instance_type=instance_type,
312 inference_tool=inference_tool,
313 )
--> 315 return super(HuggingFaceModel, self).deploy(
316 initial_instance_count,
317 instance_type,
318 serializer,
319 deserializer,
320 accelerator_type,
321 endpoint_name,
322 format_tags(tags),
323 kms_key,
324 wait,
325 data_capture_config,
326 async_inference_config,
327 serverless_inference_config,
328 volume_size=volume_size,
329 model_data_download_timeout=model_data_download_timeout,
330 container_startup_health_check_timeout=container_startup_health_check_timeout,
331 inference_recommendation_id=inference_recommendation_id,
332 explainer_config=explainer_config,
333 endpoint_logging=kwargs.get("endpoint_logging", False),
334 endpoint_type=kwargs.get("endpoint_type", None),
335 resources=kwargs.get("resources", None),
336 managed_instance_scaling=kwargs.get("managed_instance_scaling", None),
337 )
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/model.py:1610, in Model.deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, volume_size, model_data_download_timeout, container_startup_health_check_timeout, inference_recommendation_id, explainer_config, accept_eula, endpoint_logging, resources, endpoint_type, managed_instance_scaling, **kwargs)
1607 return None
1609 else: # existing single model endpoint path
-> 1610 self._create_sagemaker_model(
1611 instance_type=instance_type,
1612 accelerator_type=accelerator_type,
1613 tags=tags,
1614 serverless_inference_config=serverless_inference_config,
1615 )
1616 serverless_inference_config_dict = (
1617 serverless_inference_config._to_request_dict() if is_serverless else None
1618 )
1619 production_variant = sagemaker.production_variant(
1620 self.name,
1621 instance_type,
(...)
1627 container_startup_health_check_timeout=container_startup_health_check_timeout,
1628 )
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/model.py:865, in Model._create_sagemaker_model(self, instance_type, accelerator_type, tags, serverless_inference_config, accept_eula)
863 self.name = model_package.name
864 else:
--> 865 container_def = self.prepare_container_def(
866 instance_type,
867 accelerator_type=accelerator_type,
868 serverless_inference_config=serverless_inference_config,
869 accept_eula=accept_eula,
870 )
872 if not isinstance(self.sagemaker_session, PipelineSession):
873 # _base_name, model_name are not needed under PipelineSession.
874 # the model_data may be Pipeline variable
875 # which may break the _base_name generation
876 self._ensure_base_name_if_needed(
877 image_uri=container_def["Image"],
878 script_uri=self.source_dir,
879 model_uri=self._get_model_uri(),
880 )
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/huggingface/model.py:514, in HuggingFaceModel.prepare_container_def(self, instance_type, accelerator_type, serverless_inference_config, inference_tool, accept_eula)
505 deploy_image = self.serving_image_uri(
506 region_name,
507 instance_type,
(...)
510 inference_tool=inference_tool,
511 )
513 deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
--> 514 self._upload_code(deploy_key_prefix, repack=True)
515 deploy_env = dict(self.env)
516 deploy_env.update(self._script_mode_env_vars())
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/model.py:694, in Model._upload_code(self, key_prefix, repack)
684 """Uploads code to S3 to be used with script mode with SageMaker inference.
685
686 Args:
(...)
690 artifact should be repackaged into a new S3 object. (default: False).
691 """
692 local_code = utils.get_config_value("local.local_code", self.sagemaker_session.config)
--> 694 bucket, key_prefix = s3.determine_bucket_and_prefix(
695 bucket=self.bucket,
696 key_prefix=key_prefix,
697 sagemaker_session=self.sagemaker_session,
698 )
700 if (self.sagemaker_session.local_mode and local_code) or self.entry_point is None:
701 self.uploaded_code = None
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/s3_utils.py:147, in determine_bucket_and_prefix(bucket, key_prefix, sagemaker_session)
145 final_key_prefix = key_prefix
146 else:
--> 147 final_bucket = sagemaker_session.default_bucket()
149 # default_bucket_prefix (if it exists) should be appended if (and only if) 'bucket' does not
150 # exist and we are using the Session's default_bucket.
151 final_key_prefix = s3_path_join(sagemaker_session.default_bucket_prefix, key_prefix)
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/session.py:586, in Session.default_bucket(self)
584 default_bucket = self._default_bucket_name_override
585 if not default_bucket:
--> 586 default_bucket = generate_default_sagemaker_bucket_name(self.boto_session)
587 self._default_bucket_set_by_sdk = True
589 self._create_s3_bucket_if_it_does_not_exist(
590 bucket_name=default_bucket,
591 region=region,
592 )
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/session.py:7359, in generate_default_sagemaker_bucket_name(boto_session)
7351 """Generates a name for the default sagemaker S3 bucket.
7352
7353 Args:
7354 boto_session (boto3.session.Session): The underlying Boto3 session which AWS service
7355 """
7356 region = boto_session.region_name
7357 account = boto_session.client(
7358 "sts", region_name=region, endpoint_url=sts_regional_endpoint(region)
-> 7359 ).get_caller_identity()["Account"]
7360 return "sagemaker-{}-{}".format(region, account)
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/botocore/client.py:565, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
561 raise TypeError(
562 f"{py_operation_name}() only accepts keyword arguments."
563 )
564 # The "self" in this scope is referring to the BaseClient.
--> 565 return self._make_api_call(operation_name, kwargs)
File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/botocore/client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params)
1017 error_code = error_info.get("QueryErrorCode") or error_info.get(
1018 "Code"
1019 )
1020 error_class = self.exceptions.from_code(error_code)
-> 1021 raise error_class(parsed_response, operation_name)
1022 else:
1023 return parsed_response
It looks like sagemaker still tries to determine an S3 bucket location even with local_code set to true? Did I misunderstand the documentation? I want to use my tarball on disk to test my inference script.
- Newest
- Most votes
- Most comments
Hi,
As per this, you can either deploy a model from the hub OR deploy a model from a S3 location. Also as per this documentation, It appears the class sagemaker.huggingface.model.HuggingFaceModel, model_data only supports S3 as mentioned here: model_data (str or PipelineVariable) – The Amazon S3 location of a SageMaker model data .tar.gz file.
Hope this clarifies.
Thanks, Rama
Relevant content
- Accepted Answer
- asked 2 months agolg...
- Accepted Answerasked 8 months agolg...
- asked 4 months agolg...
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 9 months ago