Skip to content

How do I troubleshoot partial or intermittent DNS failures related to Amazon VPC?

11 minute read
0

I want to troubleshoot partial or intermittent DNS failures that are related to my Amazon Virtual Private Cloud (Amazon VPC) environments.

Resolution

The name servers are incorrect at the registrar

To correctly configure your name servers at the registrar, take the following actions.

Find your registered name servers

Complete the following steps:

  1. Log in to your domain registrar account.
  2. Find the listed name servers for your domain on your control panel or account dashboard.
  3. Use the ICANN Lookup tool to find the registrar information for your domain. For more information, see Use ICANN Lookup on the Google Cloud website. Or, run the whois example.com command.
    Note: Replace example.com with your domain name.

You can also use command-line tools to find your registered name server. For example, on a Windows or macOS system, use the nslookup or dig -t NS commands to view the name servers. Make sure to look up the name servers that are associated with the domain name.

Compare name servers

Log in to your web hosting account to find the name servers from your hosting provider. Then, compare your domain's name servers with your hosting provider's name servers to make sure that they match.

Use online verification tools

In an online tool, enter your domain name to check whether the registered name servers are publicly resolving. For example, you can use DNS Checker on the DNS Checker website or DNS Lookup on the MxToolbox website.

Update incorrect settings

Resolve inconsistencies between your registrar's settings and your hosting provider's information on the registrar's platform.

The name severs are incorrect in the hosted zone

A partial DNS failure occurs when the resolver uses the incorrect name server to resolve the domain. This is because you updated or added a name server to the NS record.

To view your domain's active name servers, complete the following steps:

  1. Open your terminal.
  2. Run the dig -t NS yourdomain.com command.
    Note: Replace yourdomain.com with your domain name.
  3. Press Enter.

You can also use your cloud provider's web interface to view your hosted zone. For example, you can use Google Cloud DNS or Amazon Route 53.

To use a web interface, complete the following steps:

  1. Log in to your cloud provider's web interface.
  2. In the section for your domain's hosted zone, find the list of NS records.
  3. Check the records against the name servers that manage your domain.

For more information, see Update your domain's name servers on the Google Cloud website.

The resolvers in your configuration file are incorrect

If you don't correctly set the resolvers in your configuration file, then you can experience DNS issues. For example, an Amazon Elastic Compute Cloud (Amazon EC2) instance in an Amazon VPC uses the name servers that you defined in your configuration file. The EC2 instances use these name servers to resolve the domain.

To resolve this issue, update your configuration file with the correct resolvers. For Linux, the configuration file is /etc/resolv.conf. For Windows, it's %SystemRoot%\\system32\\dns.

An Amazon provided DNS server is throttling the DNS queries

Amazon provided DNS servers reject traffic that exceeds the quota of 1024 packets per second for each elastic network interface. The DNS throttles and intermittently times out.

To resolve this issue, take one of the following actions:

To further troubleshoot, see How do I determine whether my DNS queries to the Amazon DNS server fail because of VPC DNS throttling?

CoreDNS is throttling the DNS queries for Amazon EKS

The Amazon VPC resolver supports up to 1024 packets per second for each network interface.

To remain within the quota, use inter-pod anti-affinity with CoreDNS in Amazon Elastic Kubernetes Service (Amazon EKS). Inter-pod anti-affinity routes traffic through different network interfaces to reduce DNS queries for each interface.

Schedule CoreDNS Pods on separate nodes. To use inter-pod anti-affinity rules to schedule CoreDNS Pods on separate instances, add the following options to the CoreDNS deployment:

podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostname
weight: 100

Note: For more information about inter-pod anti-affinity, see Inter-pod affinity and anti-affinity on the Kubernetes website.

If queries still throttle, then increase CoreDNS replicas or use NodeLocal DNSCache. For more information, see Using NodeLocal DNSCache in Kubernetes clusters on the Kubernetes website.

The domain URL resolves from the internet but not from the EC2 instance

DNS queries for your domain always resolve from the private hosted zone in the following scenarios:

If the queried record for your domain isn't in the private hosted zone, then the DNS query fails and you receive an NXDOMAIN error. Also, the DNS resolver doesn't forward the DNS query to the public domain. Because the DNS record is in the public domain zone, it resolves from the internet. For more information, see Supported routing policies for records in a private hosted zone.

To resolve this issue, delete your private hosted zone or disassociate the VPC from your private hosted zone.

The DNS firewall rule in Route 53 is incorrect

If you incorrectly configured the DNS firewall rule in Route 53, then your domain doesn't resolve from a virtual private server (VPS). Instead, it resolves on the internet through a public resolver with 1.1.1.1 or 8.8.8.8 as the resolver IP address.

To correctly configure the DNS fire rule, use Route 53 Resolver DNS Firewall to filter outbound DNS traffic for your domain.

The Route 53 resolver endpoints are incorrect

To resolve this issue, take the following actions:

  • Use resolver query logging to view the DNS queries that your endpoints handle. You can send the logs to Amazon CloudWatch Logs or an Amazon Simple Storage Service (Amazon S3) bucket.
  • Verify network connectivity from the outbound endpoint to the destination DNS server's IP address. To test the connection, use dig or nslookup from a temporary EC2 instance in the same subnet as the endpoint.
  • Check that the subnet's route table contains a route to the on-premises DNS server's IP address. The route is typically through a VPN or AWS Direct Connect connection.
  • Confirm that outbound rules allow outbound TCP and UDP traffic on port 53 to the on-premises DNS server's IP address and port.
  • Make sure that the network access control list (network ACL) that's associated with the endpoint subnets allows the correct outbound traffic. You must allow outbound TCP and UDP traffic to the on-premises DNS server on port 53. Also, the network ACL must allow inbound TCP and UDP traffic on the ephemeral port range (1024-65535) for the response.
  • Verify that the outbound resolver rule has the correct IP address for the on-premises DNS server. An incorrect IP address causes queries to fail.
  • Associate the resolver rule with the VPC that makes DNS queries.
  • Make sure that your on-premises DNS server has a conditional forwarding rule. Configure the rule to send DNS queries for the domain to the inbound endpoint's IP addresses.
  • Confirm that your on-premises DNS server sends recursive queries. The inbound endpoint doesn't support iterative queries.
  • For private domain resolution, you must associate the private hosted zone with the VPC where the inbound endpoints are located.
  • Activate the DNS resolution and DNS hostnames attributes for your VPC.
  • Monitor the InboundQueryVolume and OutboundQueryVolume Route 53 Resolver metrics in CloudWatch to make sure that you don't exceed the queries per second (QPS) quotas. Each endpoint IP address can process up to 10,000 QPS over UDP.

Also, if you have multiple rules for the same domain or a conflict between a resolver rule and a private hosted zone, then the resolver rule takes precedence. Unexpected behavior can occur.

For more information, see How do I troubleshoot DNS resolution issues with Route 53 Resolver endpoints?

DNS failures on Linux

For a Linux-based operating system (OS), take the following actions to troubleshoot DNS failures.

To perform a lookup against the client DNS server that's configured in the host's /etc/resolv.conf file, run the following dig command:

dig www.amazon.com
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13150
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.amazon.com.    IN    A
;; ANSWER SECTION:
www.amazon.com.        41    IN    A    54.239.17.6
;; Query time: 1 msec
;; SERVER: 10.108.0.2#53(10.108.0.2)
;; WHEN: Fri Oct 21 21:43:11 2016
;; MSG SIZE rcvd: 48

In the preceding example, the answer section shows that 54.239.17.6 is the IP address of the HTTP server for www.amazon.com.

To perform a recursive lookup of a DNS record, add the +trace variable to the following dig command:

dig +trace www.amazon.com    ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.49.amzn1 <<>> +trace www.amazon.com
;; global options: +cmd
.        518400    IN    NS    J.ROOT-SERVERS.NET.
.        518400    IN    NS    K.ROOT-SERVERS.NET.
.        518400    IN    NS    L.ROOT-SERVERS.NET.
...
;; Received 508 bytes from 10.108.0.2#53(10.108.0.2) in 31 ms

com.        172800    IN    NS    a.gtld-servers.net.
com.        172800    IN    NS    b.gtld-servers.net.
com.        172800    IN    NS    c.gtld-servers.net.
...
;; Received 492 bytes from 193.0.14.129#53(193.0.14.129) in 93 ms
amazon.com.        172800    IN    NS    pdns1.ultradns.net.
amazon.com.        172800    IN    NS    pdns6.ultradns.co.uk.
...
;; Received 289 bytes from 192.33.14.30#53(192.33.14.30) in 201 ms
www.amazon.com.    900    IN    NS    ns-1019.awsdns-63.net.
www.amazon.com.    900    IN    NS    ns-1568.awsdns-04.co.uk.
www.amazon.com.    900    IN    NS    ns-277.awsdns-34.com.
...
;; Received 170 bytes from 204.74.108.1#53(204.74.108.1) in 87 ms

www.amazon.com.    60     IN    A    54.239.26.128
www.amazon.com.    1800   IN    NS   ns-1019.awsdns-63.net.
www.amazon.com.    1800   IN    NS   ns-1178.awsdns-19.org.
...
;; Received 186 bytes from 205.251.195.251#53(205.251.195.251) in 7 ms

To perform a query that returns only the name servers, run the following dig command:

dig -t NS www.amazon.com; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.49.amzn1 <<>> -t NS www.amazon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48631
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.amazon.com.        IN    NS

;; ANSWER SECTION:
www.amazon.com.        490    IN    NS    ns-1019.awsdns-63.net.
www.amazon.com.        490    IN    NS    ns-1178.awsdns-19.org.
www.amazon.com.        490    IN    NS    ns-1568.awsdns-04.co.uk.
www.amazon.com.        490    IN    NS    ns-277.awsdns-34.com.

;; Query time: 0 msec
;; SERVER: 10.108.0.2#53(10.108.0.2)
;; WHEN: Fri Oct 21 21:48:20 2016
;; MSG SIZE rcvd: 170

In the preceding example, www.amazon.com has the following authoritative name servers:

  • ns-1019.awsdns-63.net
  • ns-1178.awsdns-19.org
  • ns-1568.awsdns-04.co.uk
  • ns-277.awsdns-34.com

To authoritatively answer questions about the www.amazon.com host name, use the dig command to directly point to your name server. Then, check whether every authoritative name server for a domain answers correctly.

The following example output is for a query to www.amazon.com to one of its authoritative name servers ns-1019.awsdns-63.net. The server response shows that www.amazon.com is available on 54.239.25.192:

dig www.amazon.com @ns-1019.awsdns-63.net.; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.49.amzn1 <<>> www.amazon.com @ns-1019.awsdns-63.net.;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31712
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.amazon.com.    IN    A

;; ANSWER SECTION:
www.amazon.com.        60    IN    A    54.239.25.192

;; AUTHORITY SECTION:
www.amazon.com.        1800    IN    NS    ns-1019.awsdns-63.net.
www.amazon.com.        1800    IN    NS    ns-1178.awsdns-19.org.
www.amazon.com.        1800    IN    NS    ns-1568.awsdns-04.co.uk.
...

;; Query time: 7 msec
;; SERVER: 205.251.195.251#53(205.251.195.251)
;; WHEN: Fri Oct 21 21:50:00 2016
;; MSG SIZE rcvd: 186

The following line shows that ns-1019.awsdns-63.net is an authoritative name server for www.amazon.com:

;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0

The aa flag shows that the name server ns-1019.awsdns-63.net gave an authoritative answer for the resource record www.amazon.com.

DNS failures on Windows

For a Windows-based OS, take the following actions to troubleshoot DNS failures.

To return the IP address that's associated with a host name, use the nslookup utility:

C:\>nslookup www.amazon.comServer:     ip-10-20-0-2.ec2.internalAddress:    10.20.0.2

Non-authoritative answer:
Name:       www.amazon.com
Address:    54.239.25.192

To determine the authoritative name servers for a host name, use the -type=NS flag with the nslookup utility:

C:\>nslookup -type=NS www.amazon.comServer:     ip-10-20-0-2.ec2.internalAddress:    10.20.0.2

Non-authoritative answer:
www.amazon.com    nameserver = ns-277.awsdns-34.com
www.amazon.com    nameserver = ns-1019.awsdns-63.net
www.amazon.com    nameserver = ns-1178.awsdns-19.org
...

To check whether ns-277.awsdns-34.com for www.amazon.com correctly responds to a request for www.amazon.com, use the following syntax:

C:\>nslookup www.amazon.com ns-277.awsdns-34.comServer:     UnKnownAddress:    205.251.193.21

Name:       www.amazon.com
Address:    54.239.25.200
AWS OFFICIALUpdated 4 months ago