2 Answers
- Newest
- Most votes
- Most comments
1
We had the same problem. The failure at the model upload stage turned out to be a red herring. A subprocess during the training stage was being killed due to being out of memory, not during the model upload stage. Note that our main training process exited cleanly.
answered 4 months ago
0
As per this document https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-output.html the final model should be written to /opt/ml/model by the algorithm in order to successfully upload it to S3 as a single object in compressed tar format
answered a year ago
Relevant content
- asked 10 months ago
- asked a year ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 8 months ago
This turned out to be the actual issue for me as well. Increasing the instance memory and playing around with different SageMaker instances solved the problem for me. Thanks for commenting!!