HuggingFaceModel with fully local mode is still trying to access AWS API

Question

Hello,

I am trying to test a HuggingFaceModel in local mode with Sagemaker. I would like to deploy the HF model for inference into my local Docker environment. I have the following code:

```python
from sagemaker.huggingface import HuggingFaceModel

huggingface_model = HuggingFaceModel(
   model_data="file:///path/to/mymodel.tar.gz",  # path to your trained SageMaker model
   role='SageMakerRole',                                            # IAM role with permissions to create an endpoint
   transformers_version="4.26",                           # Transformers version used
   pytorch_version="1.13",                                # PyTorch version used
   py_version='py39',
)

huggingface_model.deploy(
    initial_instance_count=1,
    instance_type='local'
)
```

I have the following configuration set in `~/.sagemaker/config.yaml`.

```yaml
local:
    local_code: true
    region_name: "us-west-2"
    container_config:
        shm_size: "128M"
```

It's my understanding that this should be enough to invoke local mode, but when running the code I get the following trace:

```
---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
/Users/jacob.windle/Projects/sagemaker_local_testing/TestLocalMode.ipynb Cell 3 line 9
      1 huggingface_model = HuggingFaceModel(
      2    model_data="file:///path/to/my/model.tar.gz",  # path to your trained SageMaker model
      3    role='SageMakerRole',                                            # IAM role with permissions to create an endpoint
   (...)
      6    py_version='py39',
      7 )
----> 9 huggingface_model.deploy(
     10     initial_instance_count=1,
     11     instance_type='local'
     12 )

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/huggingface/model.py:315, in HuggingFaceModel.deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, volume_size, model_data_download_timeout, container_startup_health_check_timeout, inference_recommendation_id, explainer_config, **kwargs)
    308     inference_tool = "neuron" if instance_type.startswith("ml.inf1") else "neuronx"
    309     self.image_uri = self.serving_image_uri(
    310         region_name=self.sagemaker_session.boto_session.region_name,
    311         instance_type=instance_type,
    312         inference_tool=inference_tool,
    313     )
--> 315 return super(HuggingFaceModel, self).deploy(
    316     initial_instance_count,
    317     instance_type,
    318     serializer,
    319     deserializer,
    320     accelerator_type,
    321     endpoint_name,
    322     format_tags(tags),
    323     kms_key,
    324     wait,
    325     data_capture_config,
    326     async_inference_config,
    327     serverless_inference_config,
    328     volume_size=volume_size,
    329     model_data_download_timeout=model_data_download_timeout,
    330     container_startup_health_check_timeout=container_startup_health_check_timeout,
    331     inference_recommendation_id=inference_recommendation_id,
    332     explainer_config=explainer_config,
    333     endpoint_logging=kwargs.get("endpoint_logging", False),
    334     endpoint_type=kwargs.get("endpoint_type", None),
    335     resources=kwargs.get("resources", None),
    336     managed_instance_scaling=kwargs.get("managed_instance_scaling", None),
    337 )

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/model.py:1610, in Model.deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, volume_size, model_data_download_timeout, container_startup_health_check_timeout, inference_recommendation_id, explainer_config, accept_eula, endpoint_logging, resources, endpoint_type, managed_instance_scaling, **kwargs)
   1607     return None
   1609 else:  # existing single model endpoint path
-> 1610     self._create_sagemaker_model(
   1611         instance_type=instance_type,
   1612         accelerator_type=accelerator_type,
   1613         tags=tags,
   1614         serverless_inference_config=serverless_inference_config,
   1615     )
   1616     serverless_inference_config_dict = (
   1617         serverless_inference_config._to_request_dict() if is_serverless else None
   1618     )
   1619     production_variant = sagemaker.production_variant(
   1620         self.name,
   1621         instance_type,
   (...)
   1627         container_startup_health_check_timeout=container_startup_health_check_timeout,
   1628     )

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/model.py:865, in Model._create_sagemaker_model(self, instance_type, accelerator_type, tags, serverless_inference_config, accept_eula)
    863         self.name = model_package.name
    864 else:
--> 865     container_def = self.prepare_container_def(
    866         instance_type,
    867         accelerator_type=accelerator_type,
    868         serverless_inference_config=serverless_inference_config,
    869         accept_eula=accept_eula,
    870     )
    872     if not isinstance(self.sagemaker_session, PipelineSession):
    873         # _base_name, model_name are not needed under PipelineSession.
    874         # the model_data may be Pipeline variable
    875         # which may break the _base_name generation
    876         self._ensure_base_name_if_needed(
    877             image_uri=container_def["Image"],
    878             script_uri=self.source_dir,
    879             model_uri=self._get_model_uri(),
    880         )

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/huggingface/model.py:514, in HuggingFaceModel.prepare_container_def(self, instance_type, accelerator_type, serverless_inference_config, inference_tool, accept_eula)
    505     deploy_image = self.serving_image_uri(
    506         region_name,
    507         instance_type,
   (...)
    510         inference_tool=inference_tool,
    511     )
    513 deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
--> 514 self._upload_code(deploy_key_prefix, repack=True)
    515 deploy_env = dict(self.env)
    516 deploy_env.update(self._script_mode_env_vars())

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/model.py:694, in Model._upload_code(self, key_prefix, repack)
    684 """Uploads code to S3 to be used with script mode with SageMaker inference.
    685 
    686 Args:
   (...)
    690         artifact should be repackaged into a new S3 object. (default: False).
    691 """
    692 local_code = utils.get_config_value("local.local_code", self.sagemaker_session.config)
--> 694 bucket, key_prefix = s3.determine_bucket_and_prefix(
    695     bucket=self.bucket,
    696     key_prefix=key_prefix,
    697     sagemaker_session=self.sagemaker_session,
    698 )
    700 if (self.sagemaker_session.local_mode and local_code) or self.entry_point is None:
    701     self.uploaded_code = None

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/s3_utils.py:147, in determine_bucket_and_prefix(bucket, key_prefix, sagemaker_session)
    145     final_key_prefix = key_prefix
    146 else:
--> 147     final_bucket = sagemaker_session.default_bucket()
    149     # default_bucket_prefix (if it exists) should be appended if (and only if) 'bucket' does not
    150     # exist and we are using the Session's default_bucket.
    151     final_key_prefix = s3_path_join(sagemaker_session.default_bucket_prefix, key_prefix)

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/session.py:586, in Session.default_bucket(self)
    584 default_bucket = self._default_bucket_name_override
    585 if not default_bucket:
--> 586     default_bucket = generate_default_sagemaker_bucket_name(self.boto_session)
    587     self._default_bucket_set_by_sdk = True
    589 self._create_s3_bucket_if_it_does_not_exist(
    590     bucket_name=default_bucket,
    591     region=region,
    592 )

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/sagemaker/session.py:7359, in generate_default_sagemaker_bucket_name(boto_session)
   7351 """Generates a name for the default sagemaker S3 bucket.
   7352 
   7353 Args:
   7354     boto_session (boto3.session.Session): The underlying Boto3 session which AWS service
   7355 """
   7356 region = boto_session.region_name
   7357 account = boto_session.client(
   7358     "sts", region_name=region, endpoint_url=sts_regional_endpoint(region)
-> 7359 ).get_caller_identity()["Account"]
   7360 return "sagemaker-{}-{}".format(region, account)

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/botocore/client.py:565, in ClientCreator._create_api_method.._api_call(self, *args, **kwargs)
    561     raise TypeError(
    562         f"{py_operation_name}() only accepts keyword arguments."
    563     )
    564 # The "self" in this scope is referring to the BaseClient.
--> 565 return self._make_api_call(operation_name, kwargs)

File ~/Projects/sagemaker_local_testing/.venv/lib/python3.12/site-packages/botocore/client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params)
   1017     error_code = error_info.get("QueryErrorCode") or error_info.get(
   1018         "Code"
   1019     )
   1020     error_class = self.exceptions.from_code(error_code)
-> 1021     raise error_class(parsed_response, operation_name)
   1022 else:
   1023     return parsed_response
```

It looks like sagemaker still tries to determine an S3 bucket location even with local_code set to true? Did I misunderstand the documentation? I want to use my tarball on disk to test my inference script.

Answer

Hi,

As per [this](https://huggingface.co/docs/sagemaker/inference), you can either deploy a model from the hub OR deploy a model from a S3 location. Also as per this [documentation](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html), It appears the class sagemaker.huggingface.model.HuggingFaceModel, model_data only supports S3 as mentioned here: model_data (str or PipelineVariable) – The Amazon S3 location of a SageMaker model data .tar.gz file.

Hope this clarifies.

Thanks,
Rama

HuggingFaceModel with fully local mode is still trying to access AWS API

Relevant content