SageMaker Model Monitor Missing Columns Constraint Violation

Question

Hi, I have run into a similar issue as the question posted [here](https://repost.aws/questions/QU8Xkelo1ARA2zcn4rHuk09w/sagemaker-model-monitor-missing-columns-constraint-violation), although the solution mentioned does not work for my use case.

I have a Sagemaker endpoint set up that is able succesfully complete inference for a given CSV file that is sent over the endpoint.

When it comes to Data Mointoring, I am able to succesfully capture inference requests in the jsonl files from the data capture. However, when I run a data mointoring job, I get the error, "no valid schema found".

In order to work around this, I have created a preprocess_handler with the idea of ensuring the proper schema is followed. Below is the preprocess_handler I use.

```
def preprocess_handler(inference_record):
    input_data = inference_record.endpoint_input.data

# Convert the input data to a Pandas DataFrame, which is able to correctly infer schema:
    df = pd.read_csv(StringIO(input_data))

#Convert to JSON, and make sure to strip out new lines
    dict_data = {col_name.strip(): [value.strip() if isinstance(value, str) else value for value in df[col_name].tolist()] for col_name in df.columns}
    res = json.dumps(dict_data)

return(res)
```

However, with this preprocess_handler, I get the violation of missing columns, with the message of: "Number of columns in current dataset: 1".  I tried to strip the new lines as recommended in the previous post [here](https://repost.aws/questions/QU8Xkelo1ARA2zcn4rHuk09w/sagemaker-model-monitor-missing-columns-constraint-violation), but still having issues.

Answer

Hi,

Thank you for using AWS Sagemaker.

I understand that you have a data monitoring job configured to monitor data captured by an endpoint. However, the monitoring job throws the following error:

"There are missing columns in current dataset. Number of columns in current dataset: 1, Number of columns in baseline constraints: x"

This can happen if in your 'data-capture.jsonl' file, in the outputs, there is a '
' character for every outputs.

Basically what you need to do here is to remove all the '
' character from your outputs in the 'data-capture.jsonl' file for the model monitor to work.

There are 2 ways to fix this:
-----------------------------
1. As the '
' character is most likely added during the inference where the inference results are generated, you can modify your inference script so that all the outputs won't have a '
' after it.

2. Add a preprocessing[1] script for your Model Monitor, this script aims to remove all the '
' character from your captured data after inference.

[1] https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html#model-monitor-pre-processing-script

You can also refer the sample Model Monitor notebook[2] , modify as per your use case and test at your end.

[2] https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.ipynb

You can also access this notebook sample from a notebook instance under the SageMaker Examples -> Sagemaker Model Monitor -> 'SageMaker-ModelMonitoring.ipynb'.

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reachout to AWS Support by creating a support case[+] so that an engineer can investigate further and help you overcome the issue. You can also share the notebook file, scripts,dataset etc which you used , so that engineer can replicate at their end and provide you with better assistance as per your use case.

Reference:
 ——————

[+]Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create 
[+]https://aws.amazon.com/premiumsupport/faqs/

SageMaker Model Monitor Missing Columns Constraint Violation

There are 2 ways to fix this:

Relevant content