Sagemaker Studio JupyterServer App does not load

0

After months of seamless work in SageMaker Studio, the JupyterServer App won't load for the last 4 days. The Control Panel shows that the JupyterServer is in "Pending" or "Failed" state after I try to launch the app. When clicking "Launch app", the screen shows that:

  • "The JupyterServer app default encountered a problem and was stopped."
  • The "Restart Now" button is visible, but pressing this results in the same behaviour. I created a new JupyterServer App and it experiences the same problem under that account. I use a different account for another project and the JupyterServer under that account works perfectly. I even mounted the EFS associated with the App on an EC2 instance and deleted some files to reduce the EFS volume but it did not help (it was 995 MB and as far as I know, 5GB is the default limit).

I found a post stating the same problem from 2 years ago, but could not follow the advice to delete the app and create a new one, since the Delete option is not available in the Action dropdown (https://repost.aws/questions/QUxoSA7eTzQbK-T4OWjJvSmQ/sage-maker-studio-will-not-load). All apps that I create is "default".

Please help, how could I overcome this and access Jupyter Lab again? Thank you.

asked 2 years ago765 views
2 Answers
0

Even if you're not seeing the "Delete app" option in console (because the app Failed), the good news is that the "Restart" button should be doing the same thing for you: So how can you find out more about what's breaking here and hopefully fix it?

To see logs, you can open the /aws/sagemaker/studio log group in Amazon CloudWatch: Here you should find log streams like {DomainID}/{UserProfileName}/JupyterServer/default and (if you have a lifecycle configuration script set up?) {DomainID}/{UserProfileName}/JupyterServer/default/LifecycleConfigOnStart.

If you do have a custom lifecycle configuration script set up, this can often be a point of failure: I'd suggest trying to detach it and/or adding some extra flags like set -ux to help debug what might be going wrong in it. Also since SageMaker Studio recently launched JupyterLab v3 support in parallel to JLv1. If you've been experimenting with both versions, it's worth checking which version your user is currently configured to use, and remembering that some setup scripts which might work on one version could break on another.

Your user's EFS home folder (and any additional setup the LCC script does) will be the only data persisted between JupyterServer launches, so content could be another point of failure. It sounds like you started to explore this already.

  • You could try to delete (or otherwise edit) the user's ~/.jupyter folder from EFS to clear any customized Jupyter configuration settings that might be causing problems during start-up. Again, this may be useful if you're using any features or extensions for which the Jupyter configuration API changed between JLv1 and v3.
  • I haven't found overall data volume to cause these kind of start-up problems myself so far. I have seen some cases where having a git repository with many active changes (e.g. thousands) causes a UI slowdown when working in the repository's folder, but I haven't seen it prevent the actual start-up I think?
AWS
EXPERT
Alex_T
answered 2 years ago
  • Hi Alex, I am facing similar issues that in CloudWatch logs it is giving me the error from the lifecycle config script and the default JupyterServer failed. However, after I disable the lifecycle config, I still cannot launch the studio and bring up a new jupyter server.

0

@Alex_T thank you for the reply. The CloudWatch logs don't show any sign of failing notebooks, and I did not have any LCC either. The final solution was the following:

  1. I mounted the EFS associated to the SageMaker domain on an EC2 instance and made a backup of all notebooks and other files (saved them on my local computer too).
  2. Then, I deleted the SageMaker domain by following these steps: https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-delete-domain.html (via CLI)
  3. Created a new SageMaker domain. The JupyterServer App starts now.
  4. Finally, I mounted the new EFS to the EC2 instance and uploaded the notebooks and files. They are all visible and working in Jupyter.
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions