Can I get class probabilities from an XGBoost model, using batch transform?

0

AutoML gave me an XGBoost model, which I would like to now use to run inference on a large CSV file.

After "deploying" the model, I figured out how to do this using a "batch transform" job. But the output appears to only be 0.0 or 1.0, class labels.

Is there a way to get class probabilities out from the model?

1 Risposta
1

Hi Apullin,

May i know how are you making the predictions currently? Here is how i manage to get probability of a class instead of only 0 or 1.

#Deploying
xgb_predictor = xgb.deploy(initial_instance_count=1,
                           instance_type='any')

#Serializer
xgb_predictor.serializer = sagemaker.serializers.CSVSerializer()

#Making the prediction
xgb_predictor.predict(array).decode('utf-8')]

You may try integrating this method into how you are performing 'batch transform' currently and see if it works.

profile pictureAWS
ESPERTO
ljunkai
con risposta un anno fa
  • Also, do check that you did not use np.round() or anything similar on your response and hence converting the results to whole numbers.

  • Ah, I am using "batch prediction" through the web dashboard itself, via a CSV uploaded to S3. I am not at all familiar with the boto3 or the sagemaker python APIs ... but maybe I'll have to try it out.

    Is there a way to do it from just the web interface?

    For getting it to work in Python, I am up against a slightly different Sagemaker issue: The model (and many others) appear to exist in the output from Autopilot in Studio, but I don't know how to reference it outside of that context. I can "deploy" it to an endpoint, which is why I did the web batch transform.

  • Hi Apullin,

    Noted that you are using the 'Create batch transform job' from SageMaker Console. I have tested the scenario using XGBoost Model for Binary Classification and is getting the probabilities. May i know what are the configurations you have set when creating the job?

  • OK, after some tinkering, I was able to run this in a Sagemaker Studio notebook:

    automl_predictor1 = sm.predictor.Predictor( "TestEndpoint1" )
    automl_predictor1.serializer = sm.serializers.CSVSerializer()
    resp = pred1.predict( df.values[:100000] ).decode('utf-8')
    resp[:300]
    

    But that still just gives back 0.0\n0.0\n1.0\n0.0\n....

    So, no idea. I suppose I'll go back to trying to do CV parameter search on XGBoost locally ... so much for AutoML.

  • One more note: While the "Model Details" view does give me a name for the model, that model does not seem to be actually accessible anywhere. That is, back in the web dashboard view, under "Inference >> Models", the long long list of models tried by AutoML do not exist there. It DOES give the path for an S3 bucket with a model.tar.gz in it, though.

    Similarly, I am not able to reference that model via the sagemaker python, and ListModels only shows me the same entries as on the web dashboard.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande