Global outage event
If you're experiencing issues with your AWS services, then please refer to the AWS Health Dashboard. You can find the overall status of ongoing outages, the health of AWS services, and the latest updates from AWS engineers.
如何对 SageMaker AI 计划笔记本作业中出现的问题进行故障排除?
我想对 Amazon SageMaker AI 计划笔记本作业中出现的问题进行故障排除。
解决方法
您的笔记本作业未在计划时间内运行
计划笔记本作业使用 Amazon EventBridge 和 SageMaker AI 训练和管道服务。如果您的笔记本未在计划时间内运行,则您的 AWS Identity and Access Management (IAM) 角色可能没有所需的权限。
您用于创建和计划笔记本作业的 IAM 角色可以是 SageMaker AI Studio 域角色,也可以是附加到域中单个用户配置文件的角色。
要为计划笔记本作业授予管理 Amazon CloudWatch 事件的权限,请在您的计划笔记本作业中添加以下策略:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "events:TagResource", "events:DeleteRule", "events:PutTargets", "events:DescribeRule", "events:PutRule", "events:RemoveTargets", "events:DisableRule", "events:EnableRule" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true" } } }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/*", "Condition": { "StringLike": { "iam:PassedToService": "events.amazonaws.com" } } }, { "Sid": "VisualEditor2", "Effect": "Allow", "Action": "sagemaker:ListTags", "Resource": "arn:aws:sagemaker:*:*:user-profile/*/*" } ] }
**注意:**上述策略允许在用户配置文件上列出标签,以识别标记为计划作业的笔记本。
当您尝试创建计划笔记本作业时,“Create”(创建)按钮显示为灰色
当您创建笔记本作业时,如果 Create(创建)按钮显示为灰色,且 Additional options(其他选项)显示 !,请检索 HAR 文件。查看问题发生期间的网络捕获情况,以确定原因。检查是否存在 Amazon Virtual Private Cloud (Amazon VPC) 配置错误或运行时角色权限问题。
如果您使用 Amazon VPC 来运行笔记本作业,请确保您拥有以下 Amazon VPC 端点:
- SageMaker AI: 要连接到 SageMaker AI,请参阅通过 Amazon VPC 接口端点连接到 SageMaker AI。
- Amazon Simple Storage Service (Amazon S3): 要连接到 Amazon S3,请参阅 Amazon S3 的网关端点。
- Amazon Elastic Compute Cloud (Amazon EC2): 要连接到 Amazon EC2,请参阅使用接口 VPC 端点访问 Amazon EC2。
- EventBridge: 仅在设置计划笔记本时使用 EventBridge 端点。要连接到 EventBridge,请参阅将 Amazon EventBridge 与接口 Amazon VPC 端点配合使用。
指定至少一个私有子网和安全组。如果您不使用私有子网,请使用另一个配置选项。有关详细信息,请参阅使用仅 VPC 模式的要求。
如果您配置了上述端点或未使用 Amazon VPC 来运行笔记本作业,请配置 IAM 权限和运行时角色权限。
要配置 IAM 权限,请完成以下步骤:
- 打开 IAM 控制台。
- 在导航窗格中,选择 Users(用户)。
- 选择与您的笔记本作业关联的 IAM 用户。
- 在下拉菜单中,选择 Add Permissions(添加权限),然后选择 Create inline policy(创建内联策略)。
- 选择 JSON 选项卡,然后添加以下策略:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "EventBridgeSchedule", "Effect": "Allow", "Action": [ "events:TagResource", "events:DeleteRule", "events:PutTargets", "events:DescribeRule", "events:EnableRule", "events:PutRule", "events:RemoveTargets", "events:DisableRule" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true" } } }, { "Sid": "IAMPassrole", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/*", "Condition": { "StringLike": { "iam:PassedToService": [ "sagemaker.amazonaws.com", "events.amazonaws.com" ] } } }, { "Sid": "IAMListRoles", "Effect": "Allow", "Action": "iam:ListRoles", "Resource": "*" }, { "Sid": "S3ArtifactsAccess", "Effect": "Allow", "Action": [ "s3:PutEncryptionConfiguration", "s3:CreateBucket", "s3:PutBucketVersioning", "s3:ListBucket", "s3:PutObject", "s3:GetObject", "s3:GetEncryptionConfiguration", "s3:DeleteObject", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::sagemaker-automated-execution-*" ] }, { "Sid": "S3DriverAccess", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::sagemakerheadlessexecution-*" ] }, { "Sid": "SagemakerJobs", "Effect": "Allow", "Action": [ "sagemaker:DescribeTrainingJob", "sagemaker:StopTrainingJob", "sagemaker:DescribePipeline", "sagemaker:CreateTrainingJob", "sagemaker:DeletePipeline", "sagemaker:CreatePipeline" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true" } } }, { "Sid": "AllowSearch", "Effect": "Allow", "Action": "sagemaker:Search", "Resource": "*" }, { "Sid": "SagemakerTags", "Effect": "Allow", "Action": [ "sagemaker:ListTags", "sagemaker:AddTags" ], "Resource": [ "arn:aws:sagemaker:*:*:pipeline/*", "arn:aws:sagemaker:*:*:space/*", "arn:aws:sagemaker:*:*:training-job/*", "arn:aws:sagemaker:*:*:user-profile/*" ] }, { "Sid": "ECRImage", "Effect": "Allow", "Action": [ "ecr:GetAuthorizationToken", "ecr:BatchGetImage" ], "Resource": "*" } ] }
要配置运行时角色权限,请完成以下步骤:
- 打开 IAM 控制台。
- 在导航窗格中,选择 Roles(角色)。
- 选择与您的笔记本作业关联的作业运行时角色。
- 选择 Trust relationships(信任关系)选项卡。
- 选择 Edit trust policy(编辑信任策略),然后添加以下策略:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "sagemaker.amazonaws.com", "events.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }
要添加访问资源的权限,请完成以下步骤:
-
打开 IAM 控制台。
-
在左侧导航窗格中,选择 Roles(角色)。
-
选择与您的笔记本作业关联的作业运行时角色。
-
在下拉菜单中,选择 Add Permissions(添加权限),然后选择 Create inline policy(创建内联策略)。
-
选择 JSON 选项卡,然后添加以下策略:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "PassroleForJobCreation", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/*", "Condition": { "StringLike": { "iam:PassedToService": "sagemaker.amazonaws.com" } } }, { "Sid": "S3ForStoringArtifacts", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": "arn:aws:s3:::sagemaker-automated-execution-*" }, { "Sid": "S3DriverAccess", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::sagemakerheadlessexecution-*" ] }, { "Sid": "SagemakerJobs", "Effect": "Allow", "Action": [ "sagemaker:StartPipelineExecution", "sagemaker:CreateTrainingJob" ], "Resource": "*" }, { "Sid": "ECRImage", "Effect": "Allow", "Action": [ "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability" ], "Resource": "*" } ] }**注意:**在上述策略中,您可以添加您的笔记本作业必须有权访问的其他资源的权限。
-
选择 Review policy(查看策略)。然后,输入您的策略的名称。
-
选择 Create policy(创建策略)。
您收到“Unable to find metadata for image”错误
您收到“Unable to find metadata for image arn in region: example-region”错误消息。
当您在创建笔记本作业时未关联或选择用户图像时,会出现上述错误。当您尝试将用户图像附加到笔记本作业时,也可能会出现此错误。
要解决此问题,请稍后重新计划您的笔记本作业。如果错误仍然出现,请联系 AWS Support。
相关信息
- 语言
- 中文 (简体)
