AWS Batch - Error while running a job

0

Hi there!

I'm trying to run a job in AWS Batch but receiving this error - 'ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.ap-south-1.amazonaws.com/": dial tcp 13.234.9.96:443: i/o timeout. Please check your task network configuration.'

On a high overview, I've a python script which fundamentally generates 1000 sample records, saves it in excel and uploads it to s3. I've built the docker image and pushed the image to ECR but getting this error. I looked on the questions from the folks facing the same issue but was not able to find any relevant solutions.

Can anyone please provide required assistance on this one? Thanks in advance!

Swami S
已提问 1 个月前85 查看次数
2 回答
0

Hi

When you define a job in AWS Batch with Fagate you have 2 roles: see this working example of mine with ExecutionRoleArn and JobRoleArn

JobDefinition1:
    Type: 'AWS::Batch::JobDefinition'
    Properties:
      JobDefinitionName: !Join
          - '-'
          - - !Ref NamePrefix
            - '<xyz>
      Type: 'container'
      PlatformCapabilities:
        - 'FARGATE'
      ContainerProperties:
        Command:
          - 'uname --all'
          - 'pwd'
          - 'ls -l'
        ExecutionRoleArn: !GetAtt BatchRole.Arn
        Image: !Join
          - ':'
          - - !GetAtt EcrRepository.RepositoryUri
            - !Ref ContainerImageTag
        JobRoleArn: !GetAtt ServiceRole.Arn
        NetworkConfiguration:
          AssignPublicIp: 'ENABLED'
        ResourceRequirements:
          - Type: 'VCPU'
            Value: '1'
          - Type: 'MEMORY'
            Value: '4096'

For execution role ARN: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-jobdefinition-containerproperties.html#cfn-batch-jobdefinition-containerproperties-executionrolearn

For JobRoleArn, see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-jobdefinition-containerproperties.html#cfn-batch-jobdefinition-containerproperties-jobrolearn

You have to grant credentials for ECR access to AWS Batch execution IAM role

Best,

Didier

profile pictureAWS
专家
已回答 1 个月前
0

Hi Didier,

Thank you for your response!! I created a new role and attached in the execution role (below is the JSON for the same)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr-public:*",
                "sts:GetServiceBearerToken"
            ],
            "Resource": "*"
        }
    ]
}

This is what I've attached in the Job Role -

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

Now I'm receiving this error - 'CannotPullContainerError: pull image manifest has been retried 5 time(s): failed to resolve ref public.ecr.aws/o4v0t6s8/swamibatchdemo:latest: failed to do request: Head "https://public.ecr.aws/v2/o4v0t6s8/swamibatchdemo/manifests/latest": dial tcp 75.2.101.78:443: i/o timeout'

I'm not able to understand what and where I'm doing something wrong? Please help and advise!

Appreciate your response

Swami S
已回答 1 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则