SageMaker TensorFlow Object Detection: Null annotations raises exception
Hi there, nice to meet you all,
I've been trying to train an Object Detection Model (using Built-in Algoritms, Tensorflow) following the jumpstart examples as template, but as soon as I provide a null annotation sagemaker fails to train when calling fit(), and throws the following error (at the end of the post is the entire stacktrace)
ValueError: Invalid dimensions for box data: (0,)
As I understand, according to the mentioned example the annotations.json should have a COCO structure-like in order to Sagemaker considers it has valid, quoting the same tutorial:
The annotations.json file should have information for bounding_boxes and their class labels. It should have a dictionary with keys "images" and "annotations". Value for the "images" key should be a list of entries, one for each image of the form {"file_name": image_name, "height": height, "width": width, "id": image_id}. Value of the 'annotations' key should be a list of entries, one for each bounding box of the form {"image_id": image_id, "bbox": [xmin, ymin, xmax, ymax], "category_id": bbox_label}.
That is, as far as I know the COCO format (it should be explicitly mentioned i think!)
Great!, So what the issue here? If I provide a dataset with equal n° of images and annotations, thats great and I can train sucessfully my model, but if I have an image with no object in it, then i will have more images than annotations, for instance:
0001.jpeg 1 has object and the corresponding annotation
0002.jpeg has no object, so it doest have an annotation
So the annotation.json file looks like:
{ "images": [ { "file_name": "0001.jpeg", "height": 1944, "width": 2592, "id": "0001" }, { "file_name": "0002.jpeg", "height": 1944, "width": 2592, "id": "0002" } ], "annotations": [ { "image_id": "0001", "bbox": [ 688, 371, 1859, 1581 ], "category_id": 0 } ] }
As far as i could investigate, this is the standrad proccedure when no object is available, I've also downloaded an entire annotation coco dataset 2017 to make sure of this (you can download "2017 Train/Val annotations [241MB]" and search of instances_val2017.json and look for files with ID 25593, 41488, 42888 ... and you'll see that there are the images ones, but not the annotations ones)
So, I would like gently ask for your help, so I can properly train my model!
Thanks in advance!
P.S:
TraceBack
[Epoch 0], Speed: 0.058 samples/sec, loss=431737.90625.
Traceback (most recent call last):
File "/opt/ml/code/transfer_learning.py", line 246, in <module>
run_with_args(args)
File "/opt/ml/code/transfer_learning.py", line 201, in run_with_args
train_and_save_model(
File "/opt/ml/code/train.py", line 130, in train_and_save_model
validation_losses = run_validation(detection_model, validation_data, batch_size, image_size, epoch)
File "/opt/ml/code/validation.py", line 25, in run_validation
losses_dict = model.loss(prediction_dict, shapes)
File "/opt/ml/code/object_detection/meta_architectures/ssd_meta_arch.py", line 824, in loss
) = self._assign_targets(
File "/opt/ml/code/object_detection/meta_architectures/ssd_meta_arch.py", line 1013, in _assign_targets
groundtruth_boxlists = [box_list.BoxList(boxes) for boxes in groundtruth_boxes_list]
File "/opt/ml/code/object_detection/meta_architectures/ssd_meta_arch.py", line 1013, in <listcomp>
groundtruth_boxlists = [box_list.BoxList(boxes) for boxes in groundtruth_boxes_list]
File "/opt/ml/code/object_detection/core/box_list.py", line 55, in __init__
raise ValueError("Invalid dimensions for box data: {}".format(boxes.shape))
ValueError: Invalid dimensions for box data: (0,)
2023-03-09 21:59:57,046 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code.
2023-03-09 21:59:57,047 sagemaker-training-toolkit INFO Done waiting for a return code. Received 1 from exiting process.
2023-03-09 21:59:57,048 sagemaker-training-toolkit ERROR Reporting training FAILURE
2023-03-09 21:59:57,048 sagemaker-training-toolkit ERROR ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise ValueError("Invalid dimensions for box data: {}".format(boxes.shape))
ValueError: Invalid dimensions for box data: (0,)"
Command "/usr/local/bin/python3.9 transfer_learning.py --batch_size 5 --beta_1 0.9 --beta_2 0.999 --early_stopping False --early_stopping_min_delta 0.0 --early_stopping_patience 5 --epochs 10 --epsilon 1e-07 --initial_accumulator_value 0.1 --learning_rate 0.001 --momentum 0.9 --optimizer adam --reinitialize_top_layer Auto --rho 0.95 --train_only_top_layer False"
2023-03-09 21:59:57,048 sagemaker-training-toolkit ERROR Encountered exit_code 1
2023-03-09 22:00:14 Uploading - Uploading generated training model
2023-03-09 22:00:20 Failed - Training job failed
-----------------------------------------------------------------------------------------------------------------------------------------------------------
UnexpectedStatusException Traceback (most recent call last)
Cell In[19], line 3
1 #with load_run(experiment_name=demo_experiment.experiment_name, run_name=demo_trial.trial_name) as run:
2 #run.log_parameter("param1", "value1")
----> 3 od_estimator.fit(
4 {"training": train_path},
5 # {"training": train_path, "validation": validation_path}},
6 logs=True,
7 job_name=training_job_name,
8 experiment_config = {
9 # "ExperimentName"
10 "TrialName" : demo_trial.trial_name,
11 "TrialComponentDisplayName" : "TrainingJob",
12 })
File /opt/conda/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py:272, in runnable_by_pipeline.<locals>.wrapper(*args, **kwargs)
268 return context
270 return _StepArguments(retrieve_caller_name(self_instance), run_func, *args, **kwargs)
--> 272 return run_func(*args, **kwargs)
File /opt/conda/lib/python3.8/site-packages/sagemaker/estimator.py:1163, in EstimatorBase.fit(self, inputs, wait, logs, job_name, experiment_config)
1161 self.jobs.append(self.latest_training_job)
1162 if wait:
-> 1163 self.latest_training_job.wait(logs=logs)
File /opt/conda/lib/python3.8/site-packages/sagemaker/estimator.py:2311, in _TrainingJob.wait(self, logs)
2309 # If logs are requested, call logs_for_jobs.
2310 if logs != "None":
-> 2311 self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)
2312 else:
2313 self.sagemaker_session.wait_for_job(self.job_name)
File /opt/conda/lib/python3.8/site-packages/sagemaker/session.py:4176, in Session.logs_for_job(self, job_name, wait, poll, log_type)
4173 last_profiler_rule_statuses = profiler_rule_statuses
4175 if wait:
-> 4176 self._check_job_status(job_name, description, "TrainingJobStatus")
4177 if dot:
4178 print()
File /opt/conda/lib/python3.8/site-packages/sagemaker/session.py:3707, in Session._check_job_status(self, job, desc, status_key_name)
3701 if "CapacityError" in str(reason):
3702 raise exceptions.CapacityError(
3703 message=message,
3704 allowed_statuses=["Completed", "Stopped"],
3705 actual_status=status,
3706 )
-> 3707 raise exceptions.UnexpectedStatusException(
3708 message=message,
3709 allowed_statuses=["Completed", "Stopped"],
3710 actual_status=status,
3711 )
UnexpectedStatusException: Error for Training job TrainNullAnnotations-tensorflow-od1-ssd-2023-03-09-21-48-59-470: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise ValueError("Invalid dimensions for box data: {}".format(boxes.shape))
ValueError: Invalid dimensions for box data: (0,)"
Command "/usr/local/bin/python3.9 transfer_learning.py --batch_size 5 --beta_1 0.9 --beta_2 0.999 --early_stopping False --early_stopping_min_delta 0.0 --early_stopping_patience 5 --epochs 10 --epsilon 1e-07 --initial_accumulator_value 0.1 --learning_rate 0.001 --momentum 0.9 --optimizer adam --reinitialize_top_layer Auto --rho 0.95 --train_only_top_layer False", exit code: 1
- Mais recentes
- Mais votos
- Mais comentários
Conteúdo relevante
- AWS OFICIALAtualizada há um ano
- AWS OFICIALAtualizada há um ano