AWS Batch - Error while running a job

0

Hi there!

I'm trying to run a job in AWS Batch but receiving this error - 'ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.ap-south-1.amazonaws.com/": dial tcp 13.234.9.96:443: i/o timeout. Please check your task network configuration.'

On a high overview, I've a python script which fundamentally generates 1000 sample records, saves it in excel and uploads it to s3. I've built the docker image and pushed the image to ECR but getting this error. I looked on the questions from the folks facing the same issue but was not able to find any relevant solutions.

Can anyone please provide required assistance on this one? Thanks in advance!

Swami S
preguntada hace un mes84 visualizaciones
2 Respuestas
0

Hi

When you define a job in AWS Batch with Fagate you have 2 roles: see this working example of mine with ExecutionRoleArn and JobRoleArn

JobDefinition1:
    Type: 'AWS::Batch::JobDefinition'
    Properties:
      JobDefinitionName: !Join
          - '-'
          - - !Ref NamePrefix
            - '<xyz>
      Type: 'container'
      PlatformCapabilities:
        - 'FARGATE'
      ContainerProperties:
        Command:
          - 'uname --all'
          - 'pwd'
          - 'ls -l'
        ExecutionRoleArn: !GetAtt BatchRole.Arn
        Image: !Join
          - ':'
          - - !GetAtt EcrRepository.RepositoryUri
            - !Ref ContainerImageTag
        JobRoleArn: !GetAtt ServiceRole.Arn
        NetworkConfiguration:
          AssignPublicIp: 'ENABLED'
        ResourceRequirements:
          - Type: 'VCPU'
            Value: '1'
          - Type: 'MEMORY'
            Value: '4096'

For execution role ARN: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-jobdefinition-containerproperties.html#cfn-batch-jobdefinition-containerproperties-executionrolearn

For JobRoleArn, see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-jobdefinition-containerproperties.html#cfn-batch-jobdefinition-containerproperties-jobrolearn

You have to grant credentials for ECR access to AWS Batch execution IAM role

Best,

Didier

profile pictureAWS
EXPERTO
respondido hace un mes
0

Hi Didier,

Thank you for your response!! I created a new role and attached in the execution role (below is the JSON for the same)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr-public:*",
                "sts:GetServiceBearerToken"
            ],
            "Resource": "*"
        }
    ]
}

This is what I've attached in the Job Role -

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

Now I'm receiving this error - 'CannotPullContainerError: pull image manifest has been retried 5 time(s): failed to resolve ref public.ecr.aws/o4v0t6s8/swamibatchdemo:latest: failed to do request: Head "https://public.ecr.aws/v2/o4v0t6s8/swamibatchdemo/manifests/latest": dial tcp 75.2.101.78:443: i/o timeout'

I'm not able to understand what and where I'm doing something wrong? Please help and advise!

Appreciate your response

Swami S
respondido hace un mes

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas