I have an Endpoint inference pipeline model deployed from an AutoPilot training job. Now that this is successful, I want to add model monitor. I have a script for online validation of the endpoint, and the F1 score is ~99%. This indicates that the endpoint interprets the call correctly.
Model Monitor is recognizing the data in my jsonl files as the data not being CSV formatted. When my Model Monitor processing job runs, I receive the following constraint violation: "There are missing columns in current dataset. Number of columns in current dataset: 1, Number of columns in baseline constraints: 225".
Given the results from the Endpoint and this Model Monitor constraint violation, I perceive there is a conflict between how the Endpoint is storing the data and how the Model Monitor Processing Job wants to consume the data.
Here is one sample prediction from the jsonl file. The data value is comma separated.
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"JHB,44443000.0,-0.0334,,44264000.0,,,,-2014000.0,,-2014000.0,,,,,,,-0.04,-0.04,55872000.0,,,0.996,,,,,,,,-0.0453,,2845000.0,,2845000.0,11636000.0,,,,,,,,,,,,190000000.0,,,,,,,,-18718000.0,,,,,,,,29000000.0,,,,,,,,-33000000.0,,-4000000.0,,,,,,,,,,,,,,,0.0,,,0.995972369102,1.0,-0.045316472785366,0.0,,,,,,,0.0,,,,,,,,,95.5638,,,,,,1.0,1.0,,0.15263157894737,,,,,,0.65252120693923,0.0,0.15263157894737,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,18606500.0,,,95.5638,,,2.3886,,,,,-0.0326,,-1.0449,,-1.05,-1.05,,0.0,,-0.1471,,,,,,,,,,,,,,,,,-0.5451,,,,,,,Financial Services,16.67890010036862","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"1\n","encoding":"CSV"}},"eventMetadata":{"eventId":"c97df615-0a2e-414d-9be3-bf3a14eb6363","inferenceTime":"2020-04-15T16:26:46Z"},"eventVersion":"0"}
Here is the point within the log that the processing job recognizes a column mismatch. I see that it pulls down the data to store locally, pulls down the statistics and constraints files, errors with this constraint, and then gracefully ends the Processing Job. If more logs are needed to analyze, I have the Processing Job logs in CloudWatch Logs.
2020-04-15 17:11:49 INFO FileUtil:66 - Read file from path /opt/ml/processing/baseline/constraints/constraints.json.
2020-04-15 17:11:50 INFO FileUtil:66 - Read file from path /opt/ml/processing/baseline/stats/statistics.json.
2020-04-15 17:11:50 ERROR DataAnalyzer:65 - There are missing columns in current dataset. Number of columns in current dataset: 1, Number of columns in baseline constraints: 225
Skipping further processing because of column count mismatch.
I could not find Model Monitor documentation on how to deal with column mismatch constraint violations.