SageMaker Model Monitor Missing Columns Constraint Violation

0

Hi, I have run into a similar issue as the question posted here, although the solution mentioned does not work for my use case.

I have a Sagemaker endpoint set up that is able succesfully complete inference for a given CSV file that is sent over the endpoint.

When it comes to Data Mointoring, I am able to succesfully capture inference requests in the jsonl files from the data capture. However, when I run a data mointoring job, I get the error, "no valid schema found".

In order to work around this, I have created a preprocess_handler with the idea of ensuring the proper schema is followed. Below is the preprocess_handler I use.

def preprocess_handler(inference_record):
    input_data = inference_record.endpoint_input.data

    # Convert the input data to a Pandas DataFrame, which is able to correctly infer schema:
    df = pd.read_csv(StringIO(input_data))

    #Convert to JSON, and make sure to strip out new lines
    dict_data = {col_name.strip(): [value.strip() if isinstance(value, str) else value for value in df[col_name].tolist()] for col_name in df.columns}
    res = json.dumps(dict_data)

    return(res)

However, with this preprocess_handler, I get the violation of missing columns, with the message of: "Number of columns in current dataset: 1". I tried to strip the new lines as recommended in the previous post here, but still having issues.

Nick
gefragt vor 8 Monaten329 Aufrufe
1 Antwort
0

Hi,

Thank you for using AWS Sagemaker.

I understand that you have a data monitoring job configured to monitor data captured by an endpoint. However, the monitoring job throws the following error:

"There are missing columns in current dataset. Number of columns in current dataset: 1, Number of columns in baseline constraints: x"

This can happen if in your 'data-capture.jsonl' file, in the outputs, there is a '\n' character for every outputs.

Basically what you need to do here is to remove all the '\n' character from your outputs in the 'data-capture.jsonl' file for the model monitor to work.

There are 2 ways to fix this:

  1. As the '\n' character is most likely added during the inference where the inference results are generated, you can modify your inference script so that all the outputs won't have a '\n' after it.

  2. Add a preprocessing[1] script for your Model Monitor, this script aims to remove all the '\n' character from your captured data after inference.

    [1] https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html#model-monitor-pre-processing-script

You can also refer the sample Model Monitor notebook[2] , modify as per your use case and test at your end.

[2] https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.ipynb

You can also access this notebook sample from a notebook instance under the SageMaker Examples -> Sagemaker Model Monitor -> 'SageMaker-ModelMonitoring.ipynb'.

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reachout to AWS Support by creating a support case[+] so that an engineer can investigate further and help you overcome the issue. You can also share the notebook file, scripts,dataset etc which you used , so that engineer can replicate at their end and provide you with better assistance as per your use case.

Reference: ——————

[+]Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create [+]https://aws.amazon.com/premiumsupport/faqs/

AWS
Radhika
beantwortet vor 8 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen