Using endpoints to access ECR for ECS Fargate tasks on a private subnet

0

I am trying to configure ECS Fargate to be able to pull images from a private repository on ECR.

I want to use the VPC endpoints to do this, because using a NAT gateway vastly increases the cost of resources.

As far as I can tell, I have everything setup as per the Amazon ECR interface VPC endpoints (AWS PrivateLink) document.

I'm getting this error in the ECS Fargate task:

CannotPullContainerError: containerd: pull command failed: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x8919d0] goroutine 1 [running]: main.(*puller).pullWithClient(0x400063dbb8, {0xd844b8, 0x40005cf860}, {0xd7f180, 0x40003e2000}, 0x4000388f70, {0xd7f130, 0x400041d6a0}) /root/go/src/github.com/aws/two/puller/pull.go:198 +0x3b0 main.(*puller).Pull(0x400063dbb8, {0xd844b8, 0x4000395cb0}, 0x4000388f70, {0xd7f130, 0x400041d6a0}) /root/go/src/github.com/aws/two/puller/pull.go:147 +0x1f0 main.(*puller).pullImage(0xd844b8?, {0xd844b8, 0x4000395cb0}, 0x4000388f70, {0xd7f130?, 0x400041d6a0?}) /root/go/src/github.com/aws/two/puller/pull.go:350 +0x34 main.main() /root/go/src/github.com/aws/two/puller/main.go:75 +0x4b4 : exit status 2

My setup details are:

  • I am using Fargate platform version 1.4.0.
  • It is working when I use a public subnet that has an internet gateway, so I know the problem is not with my image or memory config
  • I have a VPC with 2 subnets (private_a, private_b) which are on AZ's eu-east-1a and eu-east-1b
  • My ECR repo is private
  • I have the following endpoints: com.amazonaws.us-east-1.logs (interface), com.amazonaws.us-east-1.ecr.dkr (interface), com.amazonaws.us-east-1.ecr.api (interface), com.amazonaws.us-east-1.s3 (gateway)
  • The docs mentioned above say I need to use a Gateway endpoint for S3, however I also I've tried creating com.amazonaws.us-east-1.s3 as an interface endpoint, and it didn't work either
  • I opened up all outbound and inbound access to 0.0.0.0/0 and it doesn't work (my plan is to tighten it up once I get it working)
  • I have Private DNS Names enabled on all interface endpoints (it's not supported on gateway)
  • I have configured routes out to my s3 gateway on my private subnet route tables

I'm at a loss as to what the issue could be. I originally thought it could be related to the build platform on Docker because the issue was "memory panic", however I ruled this out by getting it working when the ECS task is on a public subnet (I used linux/arm64 to build my image and on the Fargate task config).

Is the issue that I need an internet gateway on my private subnet in order to route traffic out from my s3 endpoint gateway? If this is the case, I'm a bit confused because I thought the whole point was that the endpoint gateway works as an alternative to internet gateway?

Appreciate any thoughts or suggestions.

  • Can you confirm your "IMAGE" is correctly defined in your Task Def

5 Answers
2

Hello.

Make sure the ECR repository policy allows ECS tasks to pull the image. This often is a simple oversight.

Ensure that your private subnets have routes pointing to the VPC Endpoints. For Gateway Endpoints (like S3), it's an explicit route in the route table. For Interface Endpoints, it's implicit and does not appear in the route table, but ensuring connectivity to these endpoints is essential.

Ensure that 'Enable DNS Resolution' is set to 'Yes' in your VPC settings. This is critical for the private DNS associated with the VPC Endpoints to resolve correctly.

Regards, Andrii

profile picture
EXPERT
answered 8 months ago
profile picture
EXPERT
reviewed 10 days ago
  • Thanks a lot for the response. I can confirm that my ECS task has the ability to pull as the execution role has ecr:*, s3:* and also the exact ECS configuration is able to pull and run an image from ECR if I launch ECS on a public subnet (that has an Internet Gateway). I also confirmed that DNS Resolution is set to yes on all interface gateways and my VPC. Unfortunately none of this helped. I'm also a bit confused why the issue seems to reference "memory".

1
Accepted Answer

The issue was with the routes applied to my s3 gateway endpoint.

The solution was to add the VPC default route table to the route tables configured on the endpoint. I believe this is required to allow connectivity from the endpoint to the VPC.

profile picture
Mark
answered 8 months ago
profile picture
EXPERT
reviewed 10 days ago
1

Hi,

I'd suggest you to start an EC2 instance in your VPC with docker installed on it and try a docker pull for your container from this instance (to which you connect via ssh / InstanceConnect). Then, you should be able to more easily figure out what's going and test routing + connectivity to the ECR.

Best,

Didier

profile pictureAWS
EXPERT
answered 8 months ago
  • Thanks for the suggestion, however I figured out the issue was that I wasn't adding the route table for the default vpc to the endpoint I created. It's still a pretty obscure error for this problem in AWS, but this is what fixed it.

  • I have the same issue the only difference is I don't have s3 endpoint in place(It is not needed in my case). I tried your suggestion but I can't get any error. I can pull the image and run it in ec2 instances. Any help or suggestion would be appreciated

0

I have the same issue, I can confirm that the image is working fine with ec2 instances and my local. I'm not sure why it addressed memory though. Error: CannotPullContainerError: containerd: pull command failed: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x1369f41] goroutine 1 [running]: main.(*puller).pullWithClient(0xc0006e5bc0, {0x18dc8a0, 0xc00069e6c0}, {0x18d6988, 0xc0003de000}, 0xc000388ea0, {0x18d6938, 0xc000419820}) /root/go/src/github.com/aws/two/puller/pull.go:198 +0x501 main.(*puller).Pull(0xc0006e5bc0, {0x18dc8a0, 0xc000397d10}, 0xc000388ea0, {0x18d6938, 0xc000419820}) /root/go/src/github.com/aws/two/puller/pull.go:147 +0x2a7 main.(*puller).pullImage(0x18dc8a0?, {0x18dc8a0, 0xc000397d10}, 0xc000388ea0, {0x18d6938?, 0xc000419820?}) /root/go/src/github.com/aws/two/puller/pull.go:350 +0x47 main.main() /root/go/src/github.com/aws/two/puller/main.go:75 +0x587 : exit status 2

iliya
answered 8 months ago
0

May be useful for new comers:

It's a network block indeed. In my case, a wrong input of VPC Endpoint prefix ID results in a block set in security group, which stops pulling from ECR. After fix the prefix ID, everything is all right.

AWS
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions