SageMaker Model Monitor Missing Columns Constraint Violation

0

Hi, I have run into a similar issue as the question posted here, although the solution mentioned does not work for my use case.

I have a Sagemaker endpoint set up that is able succesfully complete inference for a given CSV file that is sent over the endpoint.

When it comes to Data Mointoring, I am able to succesfully capture inference requests in the jsonl files from the data capture. However, when I run a data mointoring job, I get the error, "no valid schema found".

In order to work around this, I have created a preprocess_handler with the idea of ensuring the proper schema is followed. Below is the preprocess_handler I use.

def preprocess_handler(inference_record):
    input_data = inference_record.endpoint_input.data

    # Convert the input data to a Pandas DataFrame, which is able to correctly infer schema:
    df = pd.read_csv(StringIO(input_data))

    #Convert to JSON, and make sure to strip out new lines
    dict_data = {col_name.strip(): [value.strip() if isinstance(value, str) else value for value in df[col_name].tolist()] for col_name in df.columns}
    res = json.dumps(dict_data)

    return(res)

However, with this preprocess_handler, I get the violation of missing columns, with the message of: "Number of columns in current dataset: 1". I tried to strip the new lines as recommended in the previous post here, but still having issues.

Nick
asked 8 months ago289 views
1 Answer
0

Hi,

Thank you for using AWS Sagemaker.

I understand that you have a data monitoring job configured to monitor data captured by an endpoint. However, the monitoring job throws the following error:

"There are missing columns in current dataset. Number of columns in current dataset: 1, Number of columns in baseline constraints: x"

This can happen if in your 'data-capture.jsonl' file, in the outputs, there is a '\n' character for every outputs.

Basically what you need to do here is to remove all the '\n' character from your outputs in the 'data-capture.jsonl' file for the model monitor to work.

There are 2 ways to fix this:

  1. As the '\n' character is most likely added during the inference where the inference results are generated, you can modify your inference script so that all the outputs won't have a '\n' after it.

  2. Add a preprocessing[1] script for your Model Monitor, this script aims to remove all the '\n' character from your captured data after inference.

    [1] https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html#model-monitor-pre-processing-script

You can also refer the sample Model Monitor notebook[2] , modify as per your use case and test at your end.

[2] https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.ipynb

You can also access this notebook sample from a notebook instance under the SageMaker Examples -> Sagemaker Model Monitor -> 'SageMaker-ModelMonitoring.ipynb'.

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reachout to AWS Support by creating a support case[+] so that an engineer can investigate further and help you overcome the issue. You can also share the notebook file, scripts,dataset etc which you used , so that engineer can replicate at their end and provide you with better assistance as per your use case.

Reference: ——————

[+]Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create [+]https://aws.amazon.com/premiumsupport/faqs/

AWS
Radhika
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions