- Newest
- Most votes
- Most comments
Hi Michael,
If I understand your question correctly, you are trying to build a BYOC Model Monitor container in order to bring in custom metrics to monitor your model.
Firstly, when we take that route, the metrics that are monitored by default, lets say for Model Quality, metrics such as mae
, r1
for instance does not come out of the box, you have to incorporate logic to calculate the respective metrics listed here. In short, you cannot "add-on" a new custom metric to the existing metric OOTB until the SageMaker team exposes the container which does this logic publicly.
Secondly, there is no "strict" formatting(json files) to writing your custom metrics per say, it is at the discretion of the customer to implement the the json file as required as long as they can read that in the BYOC container to calculate violations and so on. However, we encourage customers to follow the KLL fashion as show in the example here.
Linking a few samples of BYOC implementations of model monitor for your reference below -
-
NLP data drift BYOC model monitor - https://github.com/aws-samples/detecting-data-drift-in-nlp-using-amazon-sagemaker-custom-model-monitor
-
CV BYOC model monitor - https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/
Apologies for the delayed response Michael, to answer your question s
1/ on where and what name the metrics need to be stored - The constraints and violations files need to be written to /opt/ml/processing
https://github.com/aws-samples/detecting-data-drift-in-nlp-using-amazon-sagemaker-custom-model-monitor/blob/main/docker/evaluation.py#L156 and the filename is constraint_violations.json
2/ I have ran a default ModelQualityMonitor job. I see that this job also produces the 3-set of (constraints.json, constraint_violations.json, statistics.json). How are these differentiated between monitoring job types?
Answer: Even though the files generated by each job is the same no matter the type of Model monitoring you choose(Data/Quality), if you closely observe, the model data monitoring for Statistics file will have -
"mean" : 0.13082980736646624, "sum" : 54.94851909391582, "std_dev" : 0.2511377559440087, "min" : 0.006144209299236536, "max" : 0.989563524723053, "distribution" : {
whereas, statistics for Model Quality will have ( for Binary classification problem)
"binary_classification_metrics" : { "confusion_matrix" : { "0" : { "0" : 173, "1" : 0 }, "1" : { "0" : 12, "1" : 16 }
To sum it up, yes if you provide these 3 files i.e statistics.json
, constraints.json
and constraint_violations.json
you should be all set.
Relevant content
- asked 6 months ago
- asked 8 months ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
Thank you for your answer and examples provided, it is appreciated.
You are right on the context of the question. We do know that we have to implement the logic ourselves, and that the preferred sketch to be used is compact KLL quantiles - this is totally fine.
My question is more asking where and under what name the metrics listed here: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html are stored as output in the monitoring container, so that they will be uploaded and visualized within Studio. This is so that we can mimic the behavior with our own metrics, and our own model quality logic, but still have the results visualized within Studio.
Is there a certain file to output (such as statistics.json in other cases) that would be recognized in the outputs/results directory of the container, that would be recognized and visualized, much like it is with the default implementation?
I have ran a default ModelQualityMonitor job. I see that this job also produces the 3-set of (constraints.json, constraint_violations.json, statistics.json). How are these differentiated between monitoring job types?
My original question had assumed that these would be different, with Studio looking for separate files for each monitoring job type (model quality vs data quality, etc.) to visualize.
But... if I understand your response correctly and the docs, now... if I start an appropriate custom monitoring job for each type of monitoring (data, and model quality) and provide those three files that correspond to the container contract schemas and KLL sketch... then all should work out?