Call last Sagemaker Model in Batch Transform Jobs

Question

Hi Dears,

Hope this message finds you well

I have a sagemaker model, buit by on demand notebook. 
I have been used batch transform jobs using lambda function, It take input inference json from s3 to create batch transform job and have finally predictions.

The question how can I make lambda to use last trained model automaticity ? 
      model_name = 'forecasting-deepar-2022-05-20-22-23-20-225',

Lambda code :

if 'input_data_4' in file:

def batch_transform():
                    transformer = Transformer(
                        model_name = 'forecasting-deepar-2022-05-20-22-23-20-225',
                        instance_count = 2,
                        instance_type = 'ml.m5.xlarge',
                        assemble_with = 'Line',
                        output_path = output_data_path,
                        base_transform_job_name ='daily-output-predictions-to-s3',
                        #sagemaker_session = sagemaker.session.Session,
                        accept = 'application/jsonlines')
                    transformer.transform(data = input_data_path, 
                                        content_type = 'application/jsonlines', 
                                        split_type = 'Line',
                                        wait=False, 
                                        logs=True)
                        #Waits for the Pipeline Transform Job to finish.
                    print('Batch Transform Job Created successfully!')
                batch_transform()

Thanks
Basem

Accepted Answer

Hi Basem,

If I understood correctly you'd like your Lambda function to automatically choose the latest SageMaker model when it runs, instead of hard-coding the model name.

Although you *could* do this simply with [boto3.client("sagemaker").list_models(...)](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.list_models) (which can sort by creation time), I would not recommend it. The reason is that in general this lists *all* models present in SageMaker - which might include some for different use cases in future, even if you only have the one DeepAR forecasting use-case today. You'd have to manually filter after the API call.

A **better approach** would probably be to register your forecasting models in [SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html) - which will allow you to register different versions and track extra metadata like metrics and approval status for each version if you need.

- First (e.g. from your notebook) you can [create a model package group](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model_package_group) to track your forecasting models.
- Then (when you create your SageMaker Model) you can **register** it as a new version in the group - via [Model.register()](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.register).
- At the point you want to look up which model to use, you can then [list_model_packages](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.list_model_packages) which can filter to your specific group of models, and also by approval status if you like.

So for example you could set the model package group name as a configuration environment variable for your Lambda function, and have the function dynamically look up the latest version to use from the group when needed.

Of course there are also many more custom ways to do this such as creating an SSM Parameter to track the name of your current accepted model, or creating your own model registry using a data store like DynamoDB... But SageMaker Model Registry seems like the most purpose-built tool for the job here to me.

Call last Sagemaker Model in Batch Transform Jobs

相关内容