How does sagemaker processes input file for batch transform ?

0

based on examples here to create a batch inference job in sagemaker - https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_batch_transform/custom_tensorflow_inference_script_csv_and_tfrecord/custom_tensorflow_inference_script_csv_and_tfrecord.ipynb I'm creating batch transform job via boto3 client (see below) , my input is a csv data with each row in csv file corresponds to a set up input. when sagemaker creates a batch transform job and processes input for inference , how does sagemaker processes csv data and passes it for inference? does it read csv file , and iterate each row and pass it for inference. there is another setting called batch_strategy that we can set when we create transform job, which can be set to singlerecord or multiplerecord. if it is set to multiple record, does sagemaker sends multiple csv rows at once to the prediction serving container?

also, is is possible to do some pre-processing to each csv row or entire dataset before , i pass it to inference serving api/container by writing my own script or inference file (sample below). if yes, how can i access each row of csv data in my script/inference.py file?

response = client.create_transform_job(
    TransformJobName='transform-job',
    ModelName='mymodel',
    ModelClientConfig={
        'InvocationsTimeoutInSeconds': 3600,
        'InvocationsMaxRetries': 1
    },
    BatchStrategy='MultiRecord',
    TransformInput={
        'DataSource': {
            'S3DataSource': {
                'S3DataType': 'S3Prefix',
                'S3Uri': 's3://inputpath'
            }
        },
        'ContentType': 'text/csv',
        'SplitType': 'Line'
    },
    TransformOutput={
        'S3OutputPath': 's3://outputpath',
        'Accept': 'text/csv',
        'AssembleWith': 'Line',
    },
   ...
)

file : inference.py

import json
import os

def input_handler(data, context):
    """Pre-process request input 
    Args:
        data (obj): the request data stream
        context (Context): an object containing request and configuration details
    Returns:
        (dict): a JSON-serializable dict that contains request body and headers
    """

    if context.request_content_type == "text/csv":
       ....


def output_handler(data, context):
    """Post-process ...
    ....
질문됨 4달 전92회 조회
답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인