AWS Batch - Error while running a job

0

Hi there!

I'm trying to run a job in AWS Batch but receiving this error - 'ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.ap-south-1.amazonaws.com/": dial tcp 13.234.9.96:443: i/o timeout. Please check your task network configuration.'

On a high overview, I've a python script which fundamentally generates 1000 sample records, saves it in excel and uploads it to s3. I've built the docker image and pushed the image to ECR but getting this error. I looked on the questions from the folks facing the same issue but was not able to find any relevant solutions.

Can anyone please provide required assistance on this one? Thanks in advance!

Swami S
asked 15 days ago54 views
2 Answers
0

Hi

When you define a job in AWS Batch with Fagate you have 2 roles: see this working example of mine with ExecutionRoleArn and JobRoleArn

JobDefinition1:
    Type: 'AWS::Batch::JobDefinition'
    Properties:
      JobDefinitionName: !Join
          - '-'
          - - !Ref NamePrefix
            - '<xyz>
      Type: 'container'
      PlatformCapabilities:
        - 'FARGATE'
      ContainerProperties:
        Command:
          - 'uname --all'
          - 'pwd'
          - 'ls -l'
        ExecutionRoleArn: !GetAtt BatchRole.Arn
        Image: !Join
          - ':'
          - - !GetAtt EcrRepository.RepositoryUri
            - !Ref ContainerImageTag
        JobRoleArn: !GetAtt ServiceRole.Arn
        NetworkConfiguration:
          AssignPublicIp: 'ENABLED'
        ResourceRequirements:
          - Type: 'VCPU'
            Value: '1'
          - Type: 'MEMORY'
            Value: '4096'

For execution role ARN: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-jobdefinition-containerproperties.html#cfn-batch-jobdefinition-containerproperties-executionrolearn

For JobRoleArn, see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-jobdefinition-containerproperties.html#cfn-batch-jobdefinition-containerproperties-jobrolearn

You have to grant credentials for ECR access to AWS Batch execution IAM role

Best,

Didier

profile pictureAWS
EXPERT
answered 14 days ago
0

Hi Didier,

Thank you for your response!! I created a new role and attached in the execution role (below is the JSON for the same)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr-public:*",
                "sts:GetServiceBearerToken"
            ],
            "Resource": "*"
        }
    ]
}

This is what I've attached in the Job Role -

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

Now I'm receiving this error - 'CannotPullContainerError: pull image manifest has been retried 5 time(s): failed to resolve ref public.ecr.aws/o4v0t6s8/swamibatchdemo:latest: failed to do request: Head "https://public.ecr.aws/v2/o4v0t6s8/swamibatchdemo/manifests/latest": dial tcp 75.2.101.78:443: i/o timeout'

I'm not able to understand what and where I'm doing something wrong? Please help and advise!

Appreciate your response

Swami S
answered 13 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions