Hey I am trying to deploy my RandomForest Classifier on Amazon Sagemaker but get a StatusException Error even though the script worked fine before:
The script runs fine and prints out the confusion matrix and accuracy as expected. When I try to deploy the model to amazon Sagemaker using the script it does not work.
! python script.py --n-estimators 100
--max_depth 2
--model-dir ./
--train ./
--test ./ \
Confusion Matrix:
[[13 8]
[ 1 17]]
Accuracy: 0.7692307692307693
I used the Estimator from Sagemaker Python SDK
from sagemaker.sklearn.estimator import SKLearn
sklearn_estimator = SKLearn(
entry_point='script.py',
role = get_execution_role(),
instance_count=1,
instance_type='ml.m4.xlarge',
framework_version='0.20.0',
base_job_name='rf-scikit')
I launched the training job as follows
sklearn_estimator.fit({'train':trainpath, 'test': testpath}, wait=False)
Here I am trying to deploy the model which leads to the StatusExceptionError that I cannot seem to fix
sklearn_estimator.latest_training_job.wait(logs='None')
artifact = m_boto3.describe_training_job(
TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts'['S3ModelArtifacts']
print('Model artifact persisted at ' + artifact)
2022-08-25 12:03:27 Starting - Starting the training job....
2022-08-25 12:03:52 Starting - Preparing the instances for training............
2022-08-25 12:04:55 Downloading - Downloading input data......
2022-08-25 12:05:31 Training - Downloading the training image.........
2022-08-25 12:06:22 Training - Training image download completed. Training in progress..
2022-08-25 12:06:32 Uploading - Uploading generated training model.
2022-08-25 12:06:43 Failed - Training job failed
UnexpectedStatusException Traceback (most recent call last)
<ipython-input-37-628f942a78d3> in <module>
----> 1 sklearn_estimator.latest_training_job.wait(logs='None')
2 artifact = m_boto3.describe_training_job(
3 TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts']
4
5 print('Model artifact persisted at ' + artifact)
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs)
2109 self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)
2110 else:
-> 2111 self.sagemaker_session.wait_for_job(self.job_name)
2112
2113 def describe(self):
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in wait_for_job(self, job, poll)
3226 lambda last_desc: _train_done(self.sagemaker_client, job, last_desc), None, poll
3227 )
-> 3228 self._check_job_status(job, desc, "TrainingJobStatus")
3229 return desc
3230
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name)
3390 message=message,
3391 allowed_statuses=["Completed", "Stopped"],
-> 3392 actual_status=status,
3393 )
3394
UnexpectedStatusException: Error for Training job rf-scikit-2022-08-25-12-03-25-931: Failed. Reason: AlgorithmError: framework error:
Traceback (most recent call last):
File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py", line 84, in train
entrypoint()
File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 39, in main
train(environment.Environment())
File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 35, in train
runner_type=runner.ProcessRunnerType)
File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/entry_point.py", line 100, in run
wait, capture_error
File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 291, in run
cwd=environment.code_dir,
File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 208, in check_error
info=extra_info,
sagemaker_training.errors.ExecuteUserScriptError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage ""
Command "/miniconda3/bin/python script.py"
ExecuteUserScriptErr
I am happy for some help