How does sagemaker processes input file for batch transform ?

0

based on examples here to create a batch inference job in sagemaker - https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_batch_transform/custom_tensorflow_inference_script_csv_and_tfrecord/custom_tensorflow_inference_script_csv_and_tfrecord.ipynb I'm creating batch transform job via boto3 client (see below) , my input is a csv data with each row in csv file corresponds to a set up input. when sagemaker creates a batch transform job and processes input for inference , how does sagemaker processes csv data and passes it for inference? does it read csv file , and iterate each row and pass it for inference. there is another setting called batch_strategy that we can set when we create transform job, which can be set to singlerecord or multiplerecord. if it is set to multiple record, does sagemaker sends multiple csv rows at once to the prediction serving container?

also, is is possible to do some pre-processing to each csv row or entire dataset before , i pass it to inference serving api/container by writing my own script or inference file (sample below). if yes, how can i access each row of csv data in my script/inference.py file?

response = client.create_transform_job(
    TransformJobName='transform-job',
    ModelName='mymodel',
    ModelClientConfig={
        'InvocationsTimeoutInSeconds': 3600,
        'InvocationsMaxRetries': 1
    },
    BatchStrategy='MultiRecord',
    TransformInput={
        'DataSource': {
            'S3DataSource': {
                'S3DataType': 'S3Prefix',
                'S3Uri': 's3://inputpath'
            }
        },
        'ContentType': 'text/csv',
        'SplitType': 'Line'
    },
    TransformOutput={
        'S3OutputPath': 's3://outputpath',
        'Accept': 'text/csv',
        'AssembleWith': 'Line',
    },
   ...
)

file : inference.py

import json
import os

def input_handler(data, context):
    """Pre-process request input 
    Args:
        data (obj): the request data stream
        context (Context): an object containing request and configuration details
    Returns:
        (dict): a JSON-serializable dict that contains request body and headers
    """

    if context.request_content_type == "text/csv":
       ....


def output_handler(data, context):
    """Post-process ...
    ....
asked 4 months ago89 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions