SageMaker Model Monitor Missing Columns Constraint Violation

0

Hi, I have run into a similar issue as the question posted here, although the solution mentioned does not work for my use case.

I have a Sagemaker endpoint set up that is able succesfully complete inference for a given CSV file that is sent over the endpoint.

When it comes to Data Mointoring, I am able to succesfully capture inference requests in the jsonl files from the data capture. However, when I run a data mointoring job, I get the error, "no valid schema found".

In order to work around this, I have created a preprocess_handler with the idea of ensuring the proper schema is followed. Below is the preprocess_handler I use.

def preprocess_handler(inference_record):
    input_data = inference_record.endpoint_input.data

    # Convert the input data to a Pandas DataFrame, which is able to correctly infer schema:
    df = pd.read_csv(StringIO(input_data))

    #Convert to JSON, and make sure to strip out new lines
    dict_data = {col_name.strip(): [value.strip() if isinstance(value, str) else value for value in df[col_name].tolist()] for col_name in df.columns}
    res = json.dumps(dict_data)

    return(res)

However, with this preprocess_handler, I get the violation of missing columns, with the message of: "Number of columns in current dataset: 1". I tried to strip the new lines as recommended in the previous post here, but still having issues.

Nick
已提問 8 個月前檢視次數 314 次
1 個回答
0

Hi,

Thank you for using AWS Sagemaker.

I understand that you have a data monitoring job configured to monitor data captured by an endpoint. However, the monitoring job throws the following error:

"There are missing columns in current dataset. Number of columns in current dataset: 1, Number of columns in baseline constraints: x"

This can happen if in your 'data-capture.jsonl' file, in the outputs, there is a '\n' character for every outputs.

Basically what you need to do here is to remove all the '\n' character from your outputs in the 'data-capture.jsonl' file for the model monitor to work.

There are 2 ways to fix this:

  1. As the '\n' character is most likely added during the inference where the inference results are generated, you can modify your inference script so that all the outputs won't have a '\n' after it.

  2. Add a preprocessing[1] script for your Model Monitor, this script aims to remove all the '\n' character from your captured data after inference.

    [1] https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html#model-monitor-pre-processing-script

You can also refer the sample Model Monitor notebook[2] , modify as per your use case and test at your end.

[2] https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.ipynb

You can also access this notebook sample from a notebook instance under the SageMaker Examples -> Sagemaker Model Monitor -> 'SageMaker-ModelMonitoring.ipynb'.

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reachout to AWS Support by creating a support case[+] so that an engineer can investigate further and help you overcome the issue. You can also share the notebook file, scripts,dataset etc which you used , so that engineer can replicate at their end and provide you with better assistance as per your use case.

Reference: ——————

[+]Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create [+]https://aws.amazon.com/premiumsupport/faqs/

AWS
Radhika
已回答 8 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南