Intermittently unable to resolve DNS address within a VPC.

0

I have an application running on ECS (Fargate) within a VPC. The application is packaged with a docker container. The ECS service has 2 subnets, each connected to an Internet gateway, and the routing table looks correct: https://i.imgur.com/zu6uQKu.png

The application is supposed to talk to LetsEncrypt to issue SSL certificates.

Now, quite often, I'm seeing bursts of log entries that look like this:

(ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just \\\"acme-v02.api.letsencrypt.org\\\", service name: Just \\\"443\\\"): does not exist (No address associated with hostname))\")\nReq:Nothing\n"}

Which looks like DNS issues to me.

This is a fairly high traffic service and these errors seem to come and go, without warning. The ECS service is also set to auto-scale, so some containers may be in different subnets (each with the same routing table).

What problems should I be looking at? Does this seem something specific to this service or could it just be a temporary DNS outage?

2 Answers
0

Fargate will not natively route to the internet unless you assign a Public IP Address. For a fargate service to access the internet, it must attach to a subnet that has a route 0.0.0.0/0 to a NAT Gateway or have a public IP in a subnet with a default route to a IGW. Its not clear from your image if the fargate task is attached to that subnet.

Worth checking which subnets your ECS tasks are attached too?

Fargate tasks will use the Route53 resolver for DNS resolution which can resolve internet addresses.

No internet connectivity could cause this issue. I would first move your ECS task to a subnet which has a route to a NAT Gateway if it doesnt have a public IP.

If this helps, please be sure to accept the answer to help others and me.

profile picture
EXPERT
answered a year ago
0

Based on the error message you provided, it seems that the application is unable to resolve the DNS address of LetsEncrypt intermittently. There are several potential causes for this issue, including:

  1. DNS resolution issues: This is the most likely cause of the problem. DNS resolution may fail due to various reasons such as network connectivity issues, DNS server problems, or misconfiguration. You can check if the DNS resolution is working correctly by performing a DNS lookup on the LetsEncrypt domain from within the VPC.

  2. Firewall issues: It is possible that a firewall or security group is blocking outbound traffic from the ECS service to the LetsEncrypt domain. Check your security groups and network ACLs to ensure that the necessary ports are open for outbound traffic.

  3. Load balancer issues: If the application is using a load balancer, it could be experiencing issues that are causing intermittent DNS resolution problems. Check the load balancer logs to see if there are any errors or issues that could be causing the problem.

It is also possible that the issue is temporary and could be caused by a DNS outage or other temporary network issue. You can check with your DNS provider or network administrator to see if there are any known issues or outages.

Hope that helps!

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions