Can I get class probabilities from an XGBoost model, using batch transform?


AutoML gave me an XGBoost model, which I would like to now use to run inference on a large CSV file.

After "deploying" the model, I figured out how to do this using a "batch transform" job. But the output appears to only be 0.0 or 1.0, class labels.

Is there a way to get class probabilities out from the model?

已提问 1 年前647 查看次数
1 回答

Hi Apullin,

May i know how are you making the predictions currently? Here is how i manage to get probability of a class instead of only 0 or 1.

xgb_predictor = xgb.deploy(initial_instance_count=1,

xgb_predictor.serializer = sagemaker.serializers.CSVSerializer()

#Making the prediction

You may try integrating this method into how you are performing 'batch transform' currently and see if it works.

profile pictureAWS
已回答 1 年前
  • Also, do check that you did not use np.round() or anything similar on your response and hence converting the results to whole numbers.

  • Ah, I am using "batch prediction" through the web dashboard itself, via a CSV uploaded to S3. I am not at all familiar with the boto3 or the sagemaker python APIs ... but maybe I'll have to try it out.

    Is there a way to do it from just the web interface?

    For getting it to work in Python, I am up against a slightly different Sagemaker issue: The model (and many others) appear to exist in the output from Autopilot in Studio, but I don't know how to reference it outside of that context. I can "deploy" it to an endpoint, which is why I did the web batch transform.

  • Hi Apullin,

    Noted that you are using the 'Create batch transform job' from SageMaker Console. I have tested the scenario using XGBoost Model for Binary Classification and is getting the probabilities. May i know what are the configurations you have set when creating the job?

  • OK, after some tinkering, I was able to run this in a Sagemaker Studio notebook:

    automl_predictor1 = sm.predictor.Predictor( "TestEndpoint1" )
    automl_predictor1.serializer = sm.serializers.CSVSerializer()
    resp = pred1.predict( df.values[:100000] ).decode('utf-8')

    But that still just gives back 0.0\n0.0\n1.0\n0.0\n....

    So, no idea. I suppose I'll go back to trying to do CV parameter search on XGBoost locally ... so much for AutoML.

  • One more note: While the "Model Details" view does give me a name for the model, that model does not seem to be actually accessible anywhere. That is, back in the web dashboard view, under "Inference >> Models", the long long list of models tried by AutoML do not exist there. It DOES give the path for an S3 bucket with a model.tar.gz in it, though.

    Similarly, I am not able to reference that model via the sagemaker python, and ListModels only shows me the same entries as on the web dashboard.

您未登录。 登录 发布回答。

