An error occurred (ValidationException) when calling the CreateTransformJob operation: Requested instance type cannot work with all containers of the model


Hello, I'd appreciate your help with this error: botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateTransformJob operation: Requested instance type cannot work with all containers of the model: ml.m5.4xlarge

Define the JumpStart model with the specified region

pretrained_model = JumpStartModel( model_id=jumpstart_model_name, role=my_role, region=region_name, sagemaker_session=Sagemaker_Session )


We will use a default s3 bucket for providing the input and output paths for batch transform

#s3_bucket_name = Sagemaker_Session.default_bucket()

Create the full S3 paths

s3_input_data_path = f"s3://{s3_bucket_name}/{s3_bucket_folder}/batch_input" s3_output_data_path = f"s3://{s3_bucket_name}/{s3_bucket_folder}/batch_output"

Uploading the data

#s3 = boto3.client("s3") #s3.upload_file(batchtransform_data_file_path, s3_bucket_name, f"{s3_bucket_folder}/batch_input/{batchtransform_data_file_name}") S3Uploader.upload(batchtransform_data_file_path, s3_input_data_path) print(f"input data S3 location: {s3_input_data_path}") #==============================================================================

Deploy the model to an endpoint

#pretrained_model_predictor = pretrained_model.deploy(






Creating the batch transformer object. If you have a large dataset you can

divide it into smaller chunks and use more instances for faster inference

batch_transformer = pretrained_model.transformer( instance_count=Instance_Count, instance_type=InstanceType, output_path=s3_output_data_path, assemble_with="Line", accept="text/csv", # "application/jsonlines" max_payload=Max_Payload, env = hyper_params_dict,



batch_transformer.env = hyper_params_dict

Making the predictions on the input data

batch_transformer.transform( s3_input_data_path, content_type="application/jsonlines", split_type="Line" )


gefragt vor 7 Monaten241 Aufrufe
1 Antwort

I realized this issue can be resolved by switching to the huggingface batch transform in SageMaker with the following code:

retrieve the llm image uri

llm_image = get_huggingface_llm_image_uri( "huggingface", # huggingface or lmi version=llm_image_uri_ver, session=Sagemaker_Session, region=region_name )

print ecr image uri

print(f"llm image uri: {llm_image}")

Define Model and Endpoint configuration parameter

config = { 'HF_MODEL_ID': HF_model_name, # model_id from 'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica 'MAX_INPUT_LENGTH': json.dumps(MAX_INPUT_LENGTH), # Max length of input text 'MAX_TOTAL_TOKENS': json.dumps(MAX_TOTAL_TOKENS), # Max length of the generation (including input text) 'MAX_BATCH_TOTAL_TOKENS': json.dumps(MAX_BATCH_TOTAL_TOKENS), # Limits the number of tokens that can be processed in parallel during the generation 'HUGGING_FACE_HUB_TOKEN': HUGGING_FACE_HUB_TOKEN

,'HF_MODEL_QUANTIZE': "bitsandbytes", # comment in to quantize

HF_MODEL_QUANTIZE (Optional): Meaning: Enables model quantization to reduce model size and potentially improve performance, especially inference speed.

However, lower weight precision could affect the quality of the output for some models.

Typical Choices: Can be set to "bitsandbytes" or similar, depending on the quantization method supported.


check if token is set

#assert config['HUGGING_FACE_HUB_TOKEN'] != HUGGING_FACE_HUB_TOKEN, "Please set your Hugging Face Hub token"

create HuggingFaceModel with the image uri

llm_model = HuggingFaceModel( role=my_role, image_uri=llm_image, env=config )

Specify the batch job hyperparameters here, If you want to treate each example hyperparameters different please pass hyper_params_dict as None

hyper_params = {"max_new_tokens":str( Max_New_Tokens),"truncate":str(Input_Truncation), "return_full_text":str( False)} #hyper_params = {"batch_size":str(Batch_Size), "max_new_tokens":str( Max_New_Tokens), "truncate":str(Input_Truncation), "return_full_text":str( False)}

#hyper_params_dict = {"HYPER_PARAMS": str(hyper_params)}

create transformer to run a batch job

batch_job = llm_model.transformer( instance_count=Instance_Count, instance_type=InstanceType, #strategy="MultiRecord",# Description: Determines how records should be batched. 'SingleRecord' means each record is sent individually, while 'MultiRecord' sends multiple records in a single batch. assemble_with="Line", strategy='SingleRecord', # strategy: Which determines how records should be batched into each prediction request within the batch transform job. ‘MultiRecord’ may be faster, but some use cases may require ‘SingleRecord’. output_path=s3_output_data_path, # we are using the s3 path to save the output with the input env = hyper_params, accept='application/json', #max_concurrent_transforms= MaxConcurrentTransforms,# (int): The maximum number of HTTP requests to be made to each individual transform container at one time. max_payload= MaxPayloadInMB,# (int): Maximum size of the payload in a single HTTP request to the container in MB. )

starts batch transform job and uses S3 data as input

batch_job.transform( data= f"{s3_input_data_path}/{batchtransform_data_file_name}", content_type='application/json',
split_type='Line', input_filter="$", # output_filter="$['id','SageMakerOutput']", # Select "id" from input and model output join_source='Input', # None for not Including input data in the output, Input for including wait=True, )

beantwortet vor 7 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen