How to specify the interaction_constraints parameter in SageMaker XGBoost?

0

Using the SageMaker Python SDK, I would like to specify the interaction_constraints hyperparameter on an XGBoost Estimator.

The documentation specifies these should be given as nested lists of integers (xgboost itself allows both feature names and feature indices according to their documentation).

However, I can't seem to work past the validation logic in sagemaker-xgboost-container: Given the following minimal example executed on SageMaker Studio:

import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput

role = get_execution_role()

image = sagemaker.image_uris.retrieve(
            framework="xgboost",
            region="eu-west-1",
            version="1.7-1",
        )

xgb_estimator = Estimator(
        image_uri=image,
        instance_type="ml.c5.2xlarge",
        instance_count=1,
        output_path="s3://my-bucket-name/",
        role=role,
        hyperparameters={
            "num_round": "233",
            "interaction_constraints": "[[2, 3]]",
            "tree_method": "hist", # required for interaction_constraints, although the default of "auto" should be equivalent to "hist"
        },
)

xgb_estimator.fit({"train": TrainingInput(s3_data="s3://my-bucket-name/train.csv", content_type="text/csv")})

and a csv like the following:

point_estimate,year,month,day,minute_of_day,weekday
9.0,2016,12,30,480,4
10.0,2016,12,30,495,4
18.0,2016,12,30,510,4
17.0,2016,12,30,525,4
18.0,2016,12,30,540,4
26.0,2016,12,30,555,4
14.0,2016,12,30,570,4
13.0,2016,12,30,585,4
17.0,2016,12,30,600,4

I get the following Traceback:

Traceback (most recent call last):
  File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1617, in _transform_interaction_constraints
    [feature_idx_mapping[feature_name] for feature_name in constraint]
  File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1617, in <listcomp>
    [feature_idx_mapping[feature_name] for feature_name in constraint]
KeyError: 2
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 319, in train_job
    bst = xgb.train(
  File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 620, in inner_f
    return func(**kwargs)
  File "/miniconda3/lib/python3.8/site-packages/xgboost/training.py", line 160, in train
    bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
  File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1579, in __init__
    params_processed = self._configure_constraints(params_processed)
  File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1637, in _configure_constraints
    ] = self._transform_interaction_constraints(value)
  File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1621, in _transform_interaction_constraints
    raise ValueError(
ValueError: Constrained features are not a subset of training data feature names
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_containers/_trainer.py", line 84, in train
    entrypoint()
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 102, in main
    train(framework.training_env())
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 98, in train
    run_algorithm_mode()
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 64, in run_algorithm_mode
    sagemaker_train(
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 250, in sagemaker_train
    train_job(**train_args)
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 415, in train_job
    raise exc.AlgorithmError(f"{exception_prefix}:\n {str(e)}")
sagemaker_algorithm_toolkit.exceptions.AlgorithmError: XGB train call failed with exception:
 Constrained features are not a subset of training data feature names
XGB train call failed with exception:
 Constrained features are not a subset of training data feature names

Trying with feature names "interaction_constraints": "[['month', 'day']]" instead, I get the following

Traceback (most recent call last):
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 287, in validate
    self.hyperparameters[hp].validate_range(value)
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 198, in validate_range
    if any([element not in self.range for outer in value for element in outer]):
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 198, in <listcomp>
    if any([element not in self.range for outer in value for element in outer]):
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 368, in __contains__
    or (self.min_closed is not None and value < self.min_closed)
TypeError: '<' not supported between instances of 'str' and 'int'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_containers/_trainer.py", line 84, in train
    entrypoint()
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 102, in main
    train(framework.training_env())
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 98, in train
    run_algorithm_mode()
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 64, in run_algorithm_mode
    sagemaker_train(
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 124, in sagemaker_train
    validated_train_config = hyperparameters.validate(train_config)
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 291, in validate
    raise exc.AlgorithmError(
sagemaker_algorithm_toolkit.exceptions.AlgorithmError: Hyperparameter interaction_constraints: unexpected failure when validating [['month', 'day']] (caused by TypeError)
Hyperparameter interaction_constraints: unexpected failure when validating [['month', 'day']] (caused by TypeError)

So it seems sagemaker_xgboost_container does not let me specify feature names in the first place, but while specifying indices makes it past the validation, they are given to xgboost in a way that they are interpreted as feature names. I also tried with variations like

  • "interaction_constraints": "[['2', '3']]"
  • "interaction_constraints": [['2', '3']]
  • "interaction_constraints": [[2, 3]]

with the same results as above. I also tried setting the columns in the csv to numeric indices, to no avail.

In what format are interaction constraints expected to be specified? Any insights are welcome.

Thanks in advance!

plore
asked 22 days ago58 views
1 Answer
0

Hello,

Try to use the CSV again, without headers. Target fueld to be first column...

https://repost.aws/questions/QUoW_FqSbIQKW0MqNJLlA2AA/how-to-specify-target-feature-in-sagemaker-xgboost

profile picture
EXPERT
answered 22 days ago
  • Thanks for the fast response, but unfortunately without headers I get the same traceback with Constrained features are not a subset of training data feature names

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions