Unexpected error creating notebook job in Sagemaker

0

I have a SageMaker notebook that I want to run as a notebook job (it takes a long time). Every time I try to Create Notebook Job...Create, I get "Unexpected error occurred during creation of job." Things I've tried:

  1. Making the IAM revisions suggested at https://docs.aws.amazon.com/sagemaker/latest/dg/scheduled-notebook-policies.html, including the "Add additional IAM permissions" inline policy, even though I'm using the Studio execution role.
  2. Removing all special characters from the job name.
  3. Trying different compute types, including one I know works for interactively running my notebook.
  4. Moving the notebook to my root user directory.
  5. Removing all special characters from the notebook filename.
  6. Adding a parameter, even though I don't need any parameters.
  7. Running on a schedule vs. running now.
  8. Confirming that the referenced S3 buckets in the advanced settings are there (they were auto-created).
  9. Trying a different notebook.
  10. Refreshing, closing and re-opening.
  11. Looking in the CloudWatch logs. I found "[E 2022-12-16 23:30:13.770 SchedulerApp] [Errno 2] No such file or directory: '/home/sagemaker-user/abc/abc/abc/abc/src/notebook.ipynb'" where the actual path includes only two "abc"s, not four. If I move the notebook file to the root directory, then the error has two "abc"s, still, and fails again because it's missing the "src".

In the end, I couldn't get it to stop prepending "abc/abc/" to the path, so I had to:

  1. Create a "/src/notebook.ipynb" file off the root.
  2. Select to create a job to run that.
  3. Observe that it actually runs /abc/abc/src/notebook.ipynb, even though the UI claims it will be running /src/notebook.ipynb.

This all seems completely insane, and I'm hoping there's some way to resolve this?

Thanks,

Chris

  • I've tried to imagine how the abc/abc prefix could have come to be. My current directly, when I first tried to launch a job, was abc/abc/src, but the src part isn't included in the sticky prefix. That is the Git root, however: when I initially cloned from GitHub, I manually created an abc directory and then cloned the abc repo into there, which created another abc directory. So perhaps it's for some reason using the Git root as the user root, and then adding the path from there?

  • Received similar error, however in my case, cloudwatch logs shows : "AttributeError: 'Scheduler' object has no attribute 'create_job'" , was there a solution found for this error ?

asked a year ago121 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions