How to save a model of LightGBM in sagemaker with a custom name and how to reload it using python

Question

I have train a ML model in sagemaker with algorithm LightGBM and I am not able to load it locally using python and when training it on sagemaker from python (locally instance) can I decide the folder name which is created and model is saved and why these things are not mentioned in it documentation.

Answer

You can find the actual training scripts' `train_source_uri` from the [LightGBM algorithm docs page](https://docs.aws.amazon.com/sagemaker/latest/dg/lightgbm.html) and download and unzip the bundle to your notebook to inspect what it's doing, something like this:

```
!mkdir -p tmp
!aws s3 cp --no-sign-request {train_source_uri} tmp/train_source.tar.gz
!cd tmp && tar -xvf train_source.tar.gz
```

Unfortunately, the `sagemaker_jumpstart_tabular_script_utilities` referenced by this particular script (in the main `transfer_learning.py` file) doesn't seem to be open-source anywhere... But I see a wheel file `lib/sagemaker_jumpstart_tabular_script_utilities/sagemaker_jumpstart_tabular_script_utilities-1.0.0-py2.py3-none-any.whl` which you can just `!unzip` from your notebook to see the implementation of `save_model()`. Result: it's actually saving the model file vIa [joblib.dump()](https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html).

There's a nice diagram [here in the developer guide](https://docs.aws.amazon.com/sagemaker/latest/dg/model-train-storage.html) about the mapping between folders in the container(s) SageMaker spins up for your training job, and input/output locations in Amazon S3. Your final S3 paths may depend on whether you're using the [`sagemaker` Python SDK](https://sagemaker.readthedocs.io/en/stable/overview.html) or the low-level [`boto3` AWS SDK](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_training_job.html) to create your training job. Either way though, you should be able to customise the output S3 location (see e.g. `output_path` and `base_job_name` on SMPySDK Estimator objects).

As mentioned in the doc, your training container's `/opt/ml/model` is always compressed to a `.tar.gz` archive (by SageMaker) before upload to S3.

So in summary:
- I'd suggest trying to read your model file via `joblib.load` (after downloading and extracting the `model.tar.gz` e.g. with `tar -xvf`)
- If you're trying to customize where your models get save in S3, check out the Estimator's `output_path` and `base_job_name` parameters if using the SageMaker Python SDK - or just the `OutputDataConfig` parameter if using low-level boto3 API.
- Customizing the format of the output object itself doesn't really make sense with pre-built algorithms, but for your own training scripts you can put pretty much whatever you like in the folder and it will be tarballed for you. You cannot change what folder *within the training container* is used for this output (it's always `/opt/ml/model`), but as a best practice most example scripts will [accept this as a CLI argument and/or environment variable](https://github.com/aws/amazon-sagemaker-examples/blob/18564551ad9320aded38653b7b3d4df897c70480/sagemaker-script-mode/sklearn/train.py#L21) to facilitate local debugging.

There are 3 components working together here: The SageMaker service itself, the container-side [SageMaker Training Toolkit](https://github.com/aws/sagemaker-training-toolkit) (which sets up CLI args and environment variables like `SM_MODEL_DIR` for you, installs your requirements.txt, and starts your script), and your algorithm code itself... Or in this case, the pre-built LightGBM algorithm code.

Hope that helps!

How to save a model of LightGBM in sagemaker with a custom name and how to reload it using python

Relevant content