- Newest
- Most votes
- Most comments
Even if you're not seeing the "Delete app" option in console (because the app Failed), the good news is that the "Restart" button should be doing the same thing for you: So how can you find out more about what's breaking here and hopefully fix it?
To see logs, you can open the /aws/sagemaker/studio
log group in Amazon CloudWatch: Here you should find log streams like {DomainID}/{UserProfileName}/JupyterServer/default
and (if you have a lifecycle configuration script set up?) {DomainID}/{UserProfileName}/JupyterServer/default/LifecycleConfigOnStart
.
If you do have a custom lifecycle configuration script set up, this can often be a point of failure: I'd suggest trying to detach it and/or adding some extra flags like set -ux
to help debug what might be going wrong in it. Also since SageMaker Studio recently launched JupyterLab v3 support in parallel to JLv1. If you've been experimenting with both versions, it's worth checking which version your user is currently configured to use, and remembering that some setup scripts which might work on one version could break on another.
Your user's EFS home folder (and any additional setup the LCC script does) will be the only data persisted between JupyterServer launches, so content could be another point of failure. It sounds like you started to explore this already.
- You could try to delete (or otherwise edit) the user's ~/.jupyter folder from EFS to clear any customized Jupyter configuration settings that might be causing problems during start-up. Again, this may be useful if you're using any features or extensions for which the Jupyter configuration API changed between JLv1 and v3.
- I haven't found overall data volume to cause these kind of start-up problems myself so far. I have seen some cases where having a git repository with many active changes (e.g. thousands) causes a UI slowdown when working in the repository's folder, but I haven't seen it prevent the actual start-up I think?
@Alex_T thank you for the reply. The CloudWatch logs don't show any sign of failing notebooks, and I did not have any LCC either. The final solution was the following:
- I mounted the EFS associated to the SageMaker domain on an EC2 instance and made a backup of all notebooks and other files (saved them on my local computer too).
- Then, I deleted the SageMaker domain by following these steps: https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-delete-domain.html (via CLI)
- Created a new SageMaker domain. The JupyterServer App starts now.
- Finally, I mounted the new EFS to the EC2 instance and uploaded the notebooks and files. They are all visible and working in Jupyter.
Relevant content
- asked 6 months ago
- asked 3 months ago
- asked 5 years ago
- AWS OFFICIALUpdated a year ago
- Why doesn't my SageMaker Studio Classic notebook in VPC only mode connect with my KernelGateway app?AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 2 years ago
Hi Alex, I am facing similar issues that in CloudWatch logs it is giving me the error from the lifecycle config script and the default JupyterServer failed. However, after I disable the lifecycle config, I still cannot launch the studio and bring up a new jupyter server.