By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Why does my Amazon SageMaker notebook instance get stuck in the Pending state, and then fail?

4 minute read
1

When I create or start an Amazon SageMaker notebook instance, the instance enters the Pending state. The notebook instance appears to be stuck in this state, and then it fails.

Short description

The Pending status means that SageMaker is creating the notebook instance. If any step in the creation process fails, SageMaker tries to create the notebook again. This is why a notebook might stay in the Pending state longer than expected. If SageMaker still can't create the notebook instance, the status eventually changes to Failed .

Resolution

Confirm the failure reason

Check the FailureReason response in the DescribeNotebookInstance API. You can also find the failure reason on the SageMaker console:

  • To see a pop-up window that shows a shortened version of the failure reason, hover over Failed in the Status column.
  • To see the full failure reason, choose the name of the notebook instance. The failure reason appears at the top of the Notebook instance settings section.

Use the failure reason to troubleshoot the cause of the instance failure.

Common errors

"fatal: unable to access ' https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/': Failed to connect to github.com port 443: Connection timed out"
This error happens when the networking configuration for the notebook instance doesn't support the domain name or connection for the external Git repository.

Important: Notebook instances deployed in a Virtual Private Cloud (VPC) don't automatically inherit custom route tables, such as subnet route tables for VPC peering connections. If you need a custom route table, create a lifecycle configuration script that adds the route on startup. For more information, see Understanding Amazon SageMaker notebook instance networking configurations and advanced routing options.

To validate that the Git connection is active and that you can connect to the repository from a notebook instance, take the following steps:

  1. Create a new notebook instance without an associated Git repository. Then, open the Jupyter console and use a terminal session to run the following commands:

    dig repo_hostname
  2. If the answer section of the output is empty, the notebook wasn't able to resolve the hostname. For example, the answer section for github.com appears as:

    ;; ANSWER SECTION:github.com.    16  IN     A   20.248.137.48
  3. If the answer section of the output contains a response, the domain name resolution works. Run the following command to test the connection to the hostname:

    curl -v your-git-repo-url:443
  4. If the connection is refused or times out, verify the VPC security group rules and route tables. If the connection is successful, use git commands to test your credentials:

    git pull https://your-git-repo-url

"Lifecycle Configuration failed"
If a lifecycle configuration script runs for longer than five minutes, it fails, and the notebook instance is neither created nor started. For suggestions on how to decrease script runtime, see Customize a notebook instance using a lifecycle configuration script. To troubleshoot issues with the script, check the Amazon CloudWatch logs for the lifecycle configuration:

  • Log group: /aws/sagemaker/NotebookInstances
  • Log stream: notebook-instance-name/LifecycleConfigOnStart or notebook-instance-name/LifecycleConfigOnCreate

"This Notebook Instance type 'ml.m4.xlarge' is temporarily unavailable. We apologize for the inconvenience. Please try again in a few minutes, or try a different instance type."
This error happens if Amazon Elastic Compute Cloud (Amazon EC2) doesn't have enough available capacity for the instance type that you selected. Capacity varies based on the demand for that instance type in that Region at that time. Try the request again later to see if capacity levels have changed. Or, choose a different instance type.

HTTP 500 internal errors
An HTTP 500 error shows that an unexpected error occurred when creating the notebook instance. To rule out transient issues, try creating the notebook instance again.

Related information

Associate Git repositories with SageMaker notebook instances

Common errors

AWS OFFICIAL
AWS OFFICIALUpdated 4 months ago