Call last Sagemaker Model in Batch Transform Jobs

0

Hi Dears,

Hope this message finds you well

I have a sagemaker model, buit by on demand notebook. I have been used batch transform jobs using lambda function, It take input inference json from s3 to create batch transform job and have finally predictions.

The question how can I make lambda to use last trained model automaticity ? model_name = 'forecasting-deepar-2022-05-20-22-23-20-225',

Lambda code :

if 'input_data_4' in file:

            def batch_transform():
                transformer = Transformer(
                    model_name = 'forecasting-deepar-2022-05-20-22-23-20-225',
                    instance_count = 2,
                    instance_type = 'ml.m5.xlarge',
                    assemble_with = 'Line',
                    output_path = output_data_path,
                    base_transform_job_name ='daily-output-predictions-to-s3',
                    #sagemaker_session = sagemaker.session.Session,
                    accept = 'application/jsonlines')
                transformer.transform(data = input_data_path, 
                                    content_type = 'application/jsonlines', 
                                    split_type = 'Line',
                                    wait=False, 
                                    logs=True)
                    #Waits for the Pipeline Transform Job to finish.
                print('Batch Transform Job Created successfully!')
            batch_transform()

Thanks Basem

  • Hi, how are you triggering that Lambda function that starts the batch transform job?

1 Answer
0
Accepted Answer

Hi Basem,

If I understood correctly you'd like your Lambda function to automatically choose the latest SageMaker model when it runs, instead of hard-coding the model name.

Although you could do this simply with boto3.client("sagemaker").list_models(...) (which can sort by creation time), I would not recommend it. The reason is that in general this lists all models present in SageMaker - which might include some for different use cases in future, even if you only have the one DeepAR forecasting use-case today. You'd have to manually filter after the API call.

A better approach would probably be to register your forecasting models in SageMaker Model Registry - which will allow you to register different versions and track extra metadata like metrics and approval status for each version if you need.

  • First (e.g. from your notebook) you can create a model package group to track your forecasting models.
  • Then (when you create your SageMaker Model) you can register it as a new version in the group - via Model.register().
  • At the point you want to look up which model to use, you can then list_model_packages which can filter to your specific group of models, and also by approval status if you like.

So for example you could set the model package group name as a configuration environment variable for your Lambda function, and have the function dynamically look up the latest version to use from the group when needed.

Of course there are also many more custom ways to do this such as creating an SSM Parameter to track the name of your current accepted model, or creating your own model registry using a data store like DynamoDB... But SageMaker Model Registry seems like the most purpose-built tool for the job here to me.

AWS
EXPERT
Alex_T
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions