I want to troubleshoot Amazon SageMaker AI notebook instance connectivity issues.
Short description
A SageMaker AI Jupyter notebook might be unresponsive or shows errors for the following reasons:
- SageMaker AI can't establish a connection between Jupyter and the browser.
- The notebook kernel reached its defined timeout period.
- Your resource utilization load is high.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Troubleshoot connection issues between your Jupyter notebook and the browser
When you open a Jupyter notebook, you might receive the following error:
"A connection to the notebook server could not be established. The notebook will continue trying to reconnect. Check your network connection or notebook server configuration."
To troubleshoot this error, take the following actions:
- Restart your notebook instance. When you restart, the notebook instance moves to a new host. A restart can also resolve HTTP 503 and 504 errors in the browser.
Note: It's a best practice to regularly restart notebook instances to keep notebook instance software updated.
- Restart your browser, clear your browser cache, or try a different browser.
- Use a different network connection.
- Check if the firewall, proxy, or antivirus software is blocking the connection.
- Check the log of all WebSockets in your browser. You can typically find this setting in the developer mode of the browser.
- Temporarily turn off all browser extensions, and then try again.
Generate a new notebook session token
The Jupyter notebook session token has a maximum validity of 12 hours. After the token expires, the session times out and you must refresh to reset the timeout token. However, the Jupyter kernel continues to run even if the browser disconnects.
To mitigate the effects of the 12-hour token, complete the following steps:
- Write the results of your program to a file instead of to stdout.
- Convert your program to a Python script, and then run it.
- To generate a new URL with AuthToken, make an API call to CreatePresignedNotebookInstanceURL to generate a new URL with AuthToken. Then, paste the new URL in your browser before the session expires. This generates a new 12-hour session token.
- Go directly to AuthorizedUrl.
- (Optional) To open JuypterLab, modify the URL to add "view=Lab&" to the form: https://name>.notebook.sagemaker.aws?view=Lab&AuthToken=
Check your resource utilization
To check the resource utilization on your SageMaker AI notebook instance, run the following commands in the notebook terminal.
To check memory utilization:
free -h
To check CPU utilization:
top
To check disk utilization:
df -h
It's a best practice to use a lifecycle configuration script to publish the instance metrics to Amazon CloudWatch for visibility. For more information, see publish-instance-metrics on the GitHub website.
If you see high utilization of CPU, memory, or disk utilization, then restart the notebook instance and try again. Check that your SageMaker AI notebook instance type can support the configuration for your jobs. Change your instance type, if needed.
Related information
How do I troubleshoot a SageMaker AI notebook instance that can't open Jupyter?
How do I resolve insufficient capacity errors when I launch my SageMaker AI resources?
Metrics collected by the CloudWatch agent on Linux and macOS instances