SageMaker - All metrics in statistics.json by Model Quality Monitor are "0.0 +/- 0.0", but confusion matrix is built correctly for multi-class classification!!

1

I have scheduled an hourly model-quality-monitoring job in AWS SageMaker. both the jobs, ground-truth-merge and model-quality-monitoring completes successfully without any errors. but, all the metrics calculated by the job are "0.0 +/- 0.0" while the confustion matrix gets calculated as expected.

I have done everything as mentioned in this notebook for model-quality-monitoring from sagemaker-examples with very few changes and they are:

  1. I have changed the model from xgboost churn to model trained on my data.
  2. my input to the endpoint was csv like in the example-notebook, but output was json.
  3. i have changed the problem-type from BinaryClassfication to MulticlassClassification wherever necessary.

confustion matrix was built successfully, but all metrics are 0 for some reason. So, I would like the monitoring job to calculate the multi-classification metrics on data properly.

All Logs

Here's the statistics.json file that model-quality-monitor saved to S3 with confustion matrix built, but with 0s in all the metrics:

{
  "version" : 0.0,
  "dataset" : {
    "item_count" : 4432,
    "start_time" : "2022-02-23T03:00:00Z",
    "end_time" : "2022-02-23T04:00:00Z",
    "evaluation_time" : "2022-02-23T04:13:20.193Z"
  },
  "multiclass_classification_metrics" : {
    "confusion_matrix" : {
      "0" : {
        "0" : 709,
        "2" : 530,
        "1" : 247
      },
      "2" : {
        "0" : 718,
        "2" : 497,
        "1" : 265
      },
      "1" : {
        "0" : 700,
        "2" : 509,
        "1" : 257
      }
    },
    "accuracy" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_recall" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_precision" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_f0_5" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_f1" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_f2" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "accuracy_best_constant_classifier" : {
      "value" : 0.3352888086642599,
      "standard_deviation" : 0.003252410977346705
    },
    "weighted_recall_best_constant_classifier" : {
      "value" : 0.3352888086642599,
      "standard_deviation" : 0.003252410977346705
    },
    "weighted_precision_best_constant_classifier" : {
      "value" : 0.1124185852154987,
      "standard_deviation" : 0.0021869336610830254
    },
    "weighted_f0_5_best_constant_classifier" : {
      "value" : 0.12965524348784485,
      "standard_deviation" : 0.0024239410000317335
    },
    "weighted_f1_best_constant_classifier" : {
      "value" : 0.16838092925822584,
      "standard_deviation" : 0.0028615098045768348
    },
    "weighted_f2_best_constant_classifier" : {
      "value" : 0.24009212108475822,
      "standard_deviation" : 0.003326031863819311
    }
  }
}

Here's how couple of lines of captured data looks like(prettified for readability, but each line has no tab spaces as shown below) :

{
    "captureData": {
        "endpointInput": {
            "observedContentType": "text/csv",
            "mode": "INPUT",
            "data": "0,1,628,210,30",
            "encoding": "CSV"
        },
        "endpointOutput": {
            "observedContentType": "application/json",
            "mode": "OUTPUT",
            "data": "{\"label\":\"Transfer\",\"prediction\":2,\"probabilities\":[0.228256680901919,0.0,0.7717433190980809]}\n",
            "encoding": "JSON"
        }
    },
    "eventMetadata": {
        "eventId": "a7cfba60-39ee-4796-bd85-343dcadef024",
        "inferenceId": "5875",
        "inferenceTime": "2022-02-23T04:12:51Z"
    },
    "eventVersion": "0"
}
{
    "captureData": {
        "endpointInput": {
            "observedContentType": "text/csv",
            "mode": "INPUT",
            "data": "0,3,628,286,240",
            "encoding": "CSV"
        },
        "endpointOutput": {
            "observedContentType": "application/json",
            "mode": "OUTPUT",
            "data": "{\"label\":\"Adoption\",\"prediction\":0,\"probabilities\":[0.99,0.005,0.005]}\n",
            "encoding": "JSON"
        }
    },
    "eventMetadata": {
        "eventId": "7391ac1e-6d27-4f84-a9ad-9fbd6130498a",
        "inferenceId": "5876",
        "inferenceTime": "2022-02-23T04:12:51Z"
    },
    "eventVersion": "0"
}

Here's couple of lines from my ground-truths that I have uploaded to S3 look like(prettified for readability, but each line has no tab spaces as shown below):

{
  "groundTruthData": {
    "data": "0",
    "encoding": "CSV"
  },
  "eventMetadata": {
    "eventId": "1"
  },
  "eventVersion": "0"
}
{
  "groundTruthData": {
    "data": "1",
    "encoding": "CSV"
  },
  "eventMetadata": {
    "eventId": "2"
  },
  "eventVersion": "0"
},

Here's couple of lines from the ground-truth-merged file look like(prettified for readability, but each line has no tab spaces as shown below). this file is created by the ground-truth-merge job, which is one of the two jobs that model-quality-monitoring schedule runs:

{
  "eventVersion": "0",
  "groundTruthData": {
    "data": "2",
    "encoding": "CSV"
  },
  "captureData": {
    "endpointInput": {
      "data": "1,2,1050,37,1095",
      "encoding": "CSV",
      "mode": "INPUT",
      "observedContentType": "text/csv"
    },
    "endpointOutput": {
      "data": "{\"label\":\"Return_to_owner\",\"prediction\":1,\"probabilities\":[0.14512373737373732,0.6597074314574313,0.1951688311688311]}\n",
      "encoding": "JSON",
      "mode": "OUTPUT",
      "observedContentType": "application/json"
    }
  },
  "eventMetadata": {
    "eventId": "c9e21f63-05f0-4dec-8f95-b8a1fa3483c1",
    "inferenceId": "4432",
    "inferenceTime": "2022-02-23T04:00:00Z"
  }
}
{
    "eventVersion": "0",
    "groundTruthData": {
        "data": "1",
        "encoding": "CSV"
    },
    "captureData": {
        "endpointInput": {
            "data": "0,2,628,5,90",
            "encoding": "CSV",
            "mode": "INPUT",
            "observedContentType": "text/csv"
        },
        "endpointOutput": {
            "data": "{\"label\":\"Adoption\",\"prediction\":0,\"probabilities\":[0.7029623691085284,0.0,0.29703763089147156]}\n",
            "encoding": "JSON",
            "mode": "OUTPUT",
            "observedContentType": "application/json"
        }
    },
    "eventMetadata": {
        "eventId": "5f1afc30-2ffd-42cf-8f4b-df97f1c86cb1",
        "inferenceId": "4433",
        "inferenceTime": "2022-02-23T04:00:01Z"
    }
}

Since, the confusion matrix was constructed properly, I presume that I fed the data to sagemaker-model-monitor the right-way. But, why are all the metrics 0.0, while confustion-matrix looks as expected?

EDIT 1:
Logs for the job are available here.

asked 2 years ago125 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions