AWS announces preview of AWS Interconnect - multicloud
AWS announces AWS Interconnect – multicloud (preview), providing simple, resilient, high-speed private connections to other cloud service providers. AWS Interconnect - multicloud is easy to configure and provides high-speed, resilient connectivity with dedicated bandwidth, enabling customers to interconnect AWS networking services such as AWS Transit Gateway, AWS Cloud WAN, and Amazon VPC to other cloud service providers with ease.
如何对 SageMaker Studio 中计划的笔记本作业进行故障排除?
当我在 Amazon SageMaker Studio 中运行计划的笔记本作业时,遇到了一个错误。
简述
两个常见错误可能会阻止 SageMaker Studio 中计划的笔记本作业:
- AccessDenied 错误
- 尝试更新作业时出现 UI 错误
解决方案
AccessDenied 错误
AccessDenied 错误往往涉及以下几个方面的问题:
- AWS Identity and Access Management(IAM)策略
- 虚拟私有云(VPC)端点策略
- 资源标签异常
IAM 策略问题
AccessDenied 错误往往是因基于权限的错误导致。因此,请遵循笔记本作业所需的 IAM 角色的最佳实践。建立基本信任关系需要下列 IAM 角色:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" }, { "Effect": "Allow", "Principal": { "Service": "events.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
此外,请验证您的 IAM 角色是否具有下列权限:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/*", "Condition": { "StringLike": { "iam:PassedToService": [ "sagemaker.amazonaws.com", "events.amazonaws.com" ] } } }, { "Effect": "Allow", "Action": [ "events:TagResource", "events:DeleteRule", "events:PutTargets", "events:DescribeRule", "events:PutRule", "events:RemoveTargets", "events:DisableRule", "events:EnableRule" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true" } } }, { "Effect": "Allow", "Action": [ "s3:CreateBucket", "s3:PutBucketVersioning", "s3:PutEncryptionConfiguration" ], "Resource": "arn:aws:s3:::sagemaker-automated-execution-*" }, { "Effect": "Allow", "Action": [ "sagemaker:ListTags" ], "Resource": [ "arn:aws:sagemaker:*:*:user-profile/*", "arn:aws:sagemaker:*:*:space/*", "arn:aws:sagemaker:*:*:training-job/*", "arn:aws:sagemaker:*:*:pipeline/*" ] }, { "Effect": "Allow", "Action": [ "sagemaker:AddTags" ], "Resource": [ "arn:aws:sagemaker:*:*:training-job/*", "arn:aws:sagemaker:*:*:pipeline/*" ] }, { "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:CreateVpcEndpoint", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DescribeDhcpOptions", "ec2:DescribeNetworkInterfaces", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcEndpoints", "ec2:DescribeVpcs", "ecr:BatchCheckLayerAvailability", "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer", "ecr:GetAuthorizationToken", "s3:ListBucket", "s3:GetBucketLocation", "s3:GetEncryptionConfiguration", "s3:PutObject", "s3:DeleteObject", "s3:GetObject", "sagemaker:DescribeDomain", "sagemaker:DescribeUserProfile", "sagemaker:DescribeSpace", "sagemaker:DescribeStudioLifecycleConfig", "sagemaker:DescribeImageVersion", "sagemaker:DescribeAppImageConfig", "sagemaker:CreateTrainingJob", "sagemaker:DescribeTrainingJob", "sagemaker:StopTrainingJob", "sagemaker:Search", "sagemaker:CreatePipeline", "sagemaker:DescribePipeline", "sagemaker:DeletePipeline", "sagemaker:StartPipelineExecution" ], "Resource": "*" } ] }
有关详细信息,请参阅 AWS managed policies for SageMaker notebooks。
VPC 端点问题
如果通过 VPC 端点启动笔记本作业,请检查该端点的配置和策略。确保遵循相关服务端点的步骤和最佳实践:
- Amazon Elastic Compute Cloud(Amazon EC2)VPC 端点
- Amazon EventBridge 端点
- SageMaker 端点
- Amazon Simple Storage Service(Amazon S3)端点
对于 Amazon S3 VPC 端点,最常见的错误与限制为单个账户的端点有关。例如,以下策略限制对 ID 为 111122223333 的账户的访问权限:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowSpecificAccountsPermission", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "s3:*", "Resource": "*", "Condition": { "StringEquals": { "s3:ResourceAccount": "111122223333" } } } ] }
在这种情况下,您还必须对用户操作允许以下桶访问:
{ "Action": [ "s3:*" ], "Resource": [ "arn:aws:s3:::sagemakerheadlessexecution-prod-*", "arn:aws:s3:::sagemakerheadlessexecution-prod-*/*" ], "Effect": "Allow", "Sid": "SCTASK14554266" }
资源标签异常
确保您的 IAM 策略具有下列权限:
{ "Effect": "Allow", "Action": [ "events:TagResource", "events:DeleteRule", "events:PutTargets", "events:DescribeRule", "events:PutRule", "events:RemoveTargets", "events:DisableRule", "events:EnableRule" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true" } } }
尝试更新作业时出现 UI 错误
当您尝试创建、描述、更新、停止或删除笔记本作业时,可能会遇到 UI 错误。在作业定义(计划的作业)方面也可能会遇到此问题。要解决此问题,请先记下 UI 中出现的错误消息。此消息通常包含解决问题的指示或建议。
如果无法解决错误,请完成下面的步骤:
- 获取错误的屏幕截图,然后将其另存为图像文件。
- 创建 HTTP 存档(HAR)文件,该文件用于在出现 UI 错误时捕获网络流量。
- 前往 SageMaker Studio 的 Jupyter 服务器终端。依次选择文件、新建、终端。
- 查看 /var/log/apps/app_container.log 中的日志,看一下 UI 出现错误时是否有异常、错误或警告。
- 通过 AWS Support 中心联系 AWS Support 部门。在您的请求中,附上错误屏幕截图、app_container.log 和 HAR 文件。
- 语言
- 中文 (简体)
