How do I troubleshoot connectivity issues with JupyterLab and Code Editor when I use SageMaker Studio in VPC-only mode?

所要時間5分
4

When I use my Amazon SageMaker Studio environment in VPC-only mode, I experience connectivity issues with my JupyterLab and Code Editor spaces.

Short description

If you don't correctly configure your virtual private cloud (VPC), then the following issues might occur in SageMaker Studio:

  • The space's loading screen doesn't respond, and you get an error message in Amazon CloudWatch Logs that's similar to "Connect timeout on endpoint URL: 'https://api.sagemaker.us-east-1.amazonaws.com/'".
  • The JupyterLab or Code Editor application fails to load.
  • There's no internet connectivity, and then commands time out.
  • JupyterLab or Code Editor extensions don't work as expected.

Resolution

Note: If you get errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Configure the security groups for SageMaker Studio

SageMaker Studio doesn't require specific port rules for basic functionality, but you must add a rule for outbound traffic from Amazon SageMaker AI. By default, SageMaker AI uses HTTPS (port 443) for API communications.

To add a rule for outbound traffic from SageMaker Studio to AWS APIs, complete the following steps:

  1. Open the Amazon Virtual Private Cloud (Amazon VPC) console.
  2. In the navigation pane, choose Security groups.
  3. Select the security group that's attached to your domain.
  4. Choose Actions, and then choose Edit outbound rules.
  5. Choose Add rule.
    For Type, choose HTTPS.
    For Destination, enter 0.0.0.0.
  6. Choose Save rules.

You might require additional ports and rules based on the resources that you want to access from SageMaker Studio. For example, your security group must allow inbound and outbound connections on port 2049 for Network File System (NFS) protocol to use the following resources:

When you access resources in your VPC from your SageMaker Studio notebook, the service account traffic flows through your elastic network interface. All apps that you create in your domain exist within your SageMaker AI service account VPC. The apps communicate with each other through the network interfaces that you attach to your VPC. The apps are part of the SageMaker Studio domain service account, but run on different Amazon Elastic Compute Cloud (Amazon EC2) instances.

To update your SageMaker Studio domain's DefaultUserSettings and DefaultSpaceSettings to use the new security group, run the update-domain AWS CLI command:

aws sagemaker update-domain --domain-id d-12345abcde \
--default-user-settings '{
    "SecurityGroups": ["sg-0000"]
  }' \
--default-space-settings '{
    "ExecutionRole": "arn:aws:iam::111111111:role/SageMakerRole",
    "SecurityGroups": ["sg-0000"]
  }' 

Note: Before you run the preceding command, you must delete all the apps with InService status from your user profiles.

Then, recreate the domain that's attached to the necessary security groups. The output against the SecurityGroups parameter lists all the security groups for the VPC that SageMaker Studio uses for communication.

To confirm that your security group updated, run the describe-domain command:

aws sagemaker describe-domain --domain-id d-12345abcde

Then, launch SageMaker Studio and confirm that the applications run correctly. To test the internet connectivity, run the following command from a notebook cell:

!curl amazon.com

For more information, see Connect Studio notebooks in a VPC to external resources.

Verify that your subnet has the correct VPC endpoints

If your SageMaker Studio resources don't require access to the internet, then you don't need to add a NAT gateway. However, a Studio notebook requires the following endpoints to run and perform basic operations:

  • SageMaker API: com.amazonaws.your-aws-region.sagemaker.api
  • SageMaker runtime: com.amazonaws.your-aws-region.sagemaker.runtime

Note: Replace your-aws-region with your AWS Region.

To access Amazon Simple Storage Service (Amazon S3) and Amazon SageMaker Projects templates, create the following endpoints:

  • For Amazon S3: com.amazonaws.your-aws-region.s3
  • For SageMaker Project templates: com.amazonaws.your-aws-region.servicecatalog

Note: Replace your-aws-region with your Region.

To associate the security groups with the VPC endpoints, complete the following steps:

  1. Open the Amazon VPC console.
  2. In the navigation pane, choose Endpoints.
  3. Select the endpoint that you want to update.
  4. Choose Actions, and then choose Manage security groups.
  5. Select the security group.
  6. Choose Save.

For more information, see Give SageMaker AI training jobs access to resources in your Amazon VPC and VPC only communication with the internet.

Connect your domain to a private subnet and active NAT gateway

When your SageMaker Studio resources require access to the internet, configure your domain to connect to private subnets. Then, create a NAT gateway and allow the traffic from the NAT gateway through your private subnet's route table. For more information, see How do I set up a NAT gateway for a private subnet in Amazon VPC?

Note: The SageMaker Studio domain that's connected to a public subnet doesn't allow you to connect to the internet.

Confirm that your VPC meets the requirements

If you launch your SageMaker Studio in VPC-only mode, then your VPC must meet the following requirements:

  • Subnets must have enough available IP addresses for the instance.
  • If you use a VPC endpoint to run SageMaker APIs, then set Enable DNS hostnames and Enable DNS Support to true for your VPC. Your VPC requires the attributes to connect to the SageMaker AI API endpoint when you use SageMaker AI features.

Troubleshoot configuration issues

If you still experience issues after you update your VPC configuration, then restart your application.

If you configured SageMaker Studio users with a different execution role, then connectivity issues might occur.

Make sure that the user's execution role permissions include the required policies to allow the role to perform the following actions:

  • CreateNetworkInterface
  • CreatePresignedDomainUrl
  • CreateSpace
  • CreateApp
  • DescribeApp
AWS公式
AWS公式更新しました 2日前
コメントはありません

関連するコンテンツ