SageMaker Model Quality Check Step for Binary Classification is failing with error message: More than two classes are not supported in binary classification

0

Transformer step is completed and output is baseline.csv.out Enter image description here

Code for transformer step and model_quality_check_step

 transformer = Transformer(
            model_name=create_model_step.properties.ModelName,
            instance_count=processing_instance_count,
            instance_type="ml.m5.xlarge",
            accept="text/csv",
            assemble_with="Line",
            output_path=transformer_output_s3,
            sagemaker_session=pipeline_session,
)
transform_arg = transformer.transform(
                    data=processing_step.properties.ProcessingOutputConfig.Outputs["baseline"].S3Output.S3Uri,
                    content_type="text/csv",
                    split_type="Line",
                    join_source="Input",
                    input_filter="$[1:]",
                    output_filter="$[0,-1]",
        )
transform_step = TransformStep(
            name="TransformDataStep",
            step_args=transform_arg,
            cache_config=cache_config,
        )



model_quality_check_config = ModelQualityCheckConfig(
                baseline_dataset=transform_step.properties.TransformOutput.S3OutputPath,
                dataset_format=DatasetFormat.csv(header=False),
                output_s3_uri=model_quality_check_step_s3,
                problem_type="BinaryClassification",
                probability_attribute="_c1",
                probability_threshold_attribute="0.5",
                ground_truth_attribute="_c0",
)

model_quality_check_step = QualityCheckStep(
                name="ModelQualityCheckStep",
                skip_check=skip_check_model_quality,
                register_new_baseline=register_new_baseline_model_quality,
                quality_check_config=model_quality_check_config,
                check_job_config=check_job_config,
                model_package_group_name=model_package_group_name,
                supplied_baseline_statistics=supplied_baseline_statistics_model_quality,
                supplied_baseline_constraints=supplied_baseline_constraints_model_quality,
            )

Error message in cloud watch: 2023-12-05 13:50:42,325 ERROR Main: Error: More than two classes are not supported in binary classification 2023-12-05 13:50:42,244 ERROR modelquality.BinaryClassificationAnalyzerImpl$: Binary classification dataset has classes (1.0,0,0.0,1), only up to two classes are supported 2023-12-05 13:50:42,397 - main - ERROR - Exception performing analysis: Command 'bin/spark-submit --master yarn --deploy-mode client --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --conf spark.serializer=org.apache.spark.serializer.KryoSerializer /opt/amazon/sagemaker-data-analyzer-1.0-jar-with-dependencies.jar --analytics_input /tmp/spark_job_config.json' returned non-zero exit status 1.

Dinesh
asked 5 months ago183 views
1 Answer
0
Accepted Answer

When dealing with the output labels from a Transformer step, I encountered an interesting observation. While the XGBoost model accepted labels in the format 1.0 and 0.0 as binary labels, the ModelQualityCheckStep surprisingly did not. To address this inconsistency, I modified the pre-processing step to output labels as 1 and 0, explicitly type casting them to integers instead of floats.

The crucial change was ensuring that the labels are of type int (1/0) rather than float (1.0/0.0). Once this adjustment was made, the ModelQualityCheckStep accepted the labels seamlessly, and the QualityCheckStep performed as expected.

Remember, it's essential to have labels represented as integers (1/0) and not as floating-point numbers (1.0/0.0) for consistent compatibility across different steps and models.

Dinesh
answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions