Short description
There are two common errors that might prevent a scheduled notebook job in SageMaker Studio:
- AccessDenied errors
- UI errors when you try to update a job
Resolution
AccessDenied errors
AccessDenied errors most commonly involve the following issues:
- AWS Identity and Access Management (IAM) policies
- Virtual private cloud (VPC) endpoint policies
- Resource tag exceptions
IAM policy issues
AccessDenied errors most commonly occur from permission based errors. Therefore, follow the best practices for the IAM role that you need for the notebook job. You need the following IAM role for the base trust relationship:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "sagemaker.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Effect": "Allow",
"Principal": {
"Service": "events.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Also, verify that your IAM role has the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::*:role/*",
"Condition": {
"StringLike": {
"iam:PassedToService": [
"sagemaker.amazonaws.com",
"events.amazonaws.com"
]
}
}
},
{
"Effect": "Allow",
"Action": [
"events:TagResource",
"events:DeleteRule",
"events:PutTargets",
"events:DescribeRule",
"events:PutRule",
"events:RemoveTargets",
"events:DisableRule",
"events:EnableRule"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
}
}
},
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:PutBucketVersioning",
"s3:PutEncryptionConfiguration"
],
"Resource": "arn:aws:s3:::sagemaker-automated-execution-*"
},
{
"Effect": "Allow",
"Action": [
"sagemaker:ListTags"
],
"Resource": [
"arn:aws:sagemaker:*:*:user-profile/*",
"arn:aws:sagemaker:*:*:space/*",
"arn:aws:sagemaker:*:*:training-job/*",
"arn:aws:sagemaker:*:*:pipeline/*"
]
},
{
"Effect": "Allow",
"Action": [
"sagemaker:AddTags"
],
"Resource": [
"arn:aws:sagemaker:*:*:training-job/*",
"arn:aws:sagemaker:*:*:pipeline/*"
]
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateNetworkInterface",
"ec2:CreateNetworkInterfacePermission",
"ec2:CreateVpcEndpoint",
"ec2:DeleteNetworkInterface",
"ec2:DeleteNetworkInterfacePermission",
"ec2:DescribeDhcpOptions",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVpcs",
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer",
"ecr:GetAuthorizationToken",
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:GetEncryptionConfiguration",
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObject",
"sagemaker:DescribeDomain",
"sagemaker:DescribeUserProfile",
"sagemaker:DescribeSpace",
"sagemaker:DescribeStudioLifecycleConfig",
"sagemaker:DescribeImageVersion",
"sagemaker:DescribeAppImageConfig",
"sagemaker:CreateTrainingJob",
"sagemaker:DescribeTrainingJob",
"sagemaker:StopTrainingJob",
"sagemaker:Search",
"sagemaker:CreatePipeline",
"sagemaker:DescribePipeline",
"sagemaker:DeletePipeline",
"sagemaker:StartPipelineExecution"
],
"Resource": "*"
}
]
}
For more information, see AWS managed policies for SageMaker notebooks.
VPC endpoint issues
If you initiate the notebook job through a VPC endpoint, then check the endpoint's configuration and policy. Make sure that you follow the steps and best practices for the relevant service endpoint:
For Amazon S3 VPC endpoints, the most common error relates to an endpoint that's restricted to a single account. For example, the following policy restricts access to an account with the ID 111122223333:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowSpecificAccountsPermission",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "s3:*",
"Resource": "*",
"Condition": {
"StringEquals": {
"s3:ResourceAccount": "111122223333"
}
}
}
]
}
In this case, you must also allow the following bucket access for the user's actions:
{
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::sagemakerheadlessexecution-prod-*",
"arn:aws:s3:::sagemakerheadlessexecution-prod-*/*"
],
"Effect": "Allow",
"Sid": "SCTASK14554266"
}
Resource tag exceptions
Make sure that your IAM policy has the follows permissions:
{
"Effect": "Allow",
"Action": [
"events:TagResource",
"events:DeleteRule",
"events:PutTargets",
"events:DescribeRule",
"events:PutRule",
"events:RemoveTargets",
"events:DisableRule",
"events:EnableRule"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
}
}
}
UI errors when you try to update a job
You might encounter a UI error when you try to create, describe, update, stop, or delete a notebook job. You might also encounter this issue with job definitions (scheduled jobs). To troubleshoot this, first note the error message that appears in the UI. This message often contains directions or suggestions actions to resolve the issue.
If you can't resolve the error, then complete the following steps:
- Take a screenshot of the error, and then save it as an image file.
- Create an HTTP Archive (HAR) file that captures the network traffic when the UI error occurs.
- Go to SageMaker Studio's Jupyter server terminal. Choose File, New, Terminal.
- Check the logs in /var/log/apps/app_container.log for exceptions, errors, or warnings at the time of the UI error.
- Contact AWS Support through the AWS Support Center. In your request, attach the error screenshot, the app_container.log, and the HAR file.