How to figure out whether NAT Gateway processing charge is due to internet bound traffic or within AWS?

5 minute read
Content level: Intermediate
0

At times, identifying the cause of NAT Gateway processing charges through VPC flow logs could be overwhelming specifically when you don’t have any clue on which type of traffic – internet or AWS bound is dominant. In this article, you will use Cost Explorer, an easy-to-use interface that lets you visualize, understand, and manage your AWS costs, to help narrow down dominant traffic.

A NAT gateway is a Network Address Translation (NAT) service. You can use a NAT gateway so that instances in a private subnet can connect to services outside your Virtual Private Cloud (VPC) but external services cannot initiate a connection with those instances. These external services can reside either outside of AWS cloud or within AWS. We refer traffic to or from services or endpoints outside of AWS cloud as internet bound. And, traffic to or from services such as other VPC or AWS public endpoints or remote AWS region as AWS bound. The NAT gateway processing charge is based on the amount of traffic (in GB) that traverses either inbound or outbound to the VPC. In some cases, the NAT gateway processing charge seems unexpectedly high, and requires further investigation. There are various tools like VPC flow logs available to analyze this traffic.

In this article, I will help you get a high-level idea using Cost Explorer to narrow down whether the dominant traffic is AWS or internet bound. First, you narrow down AWS account and the region where you see the most NAT gateway processing charges coming from. Second, you identify how much of NAT gateway processing charges is caused by internet or AWS bound traffic. Lastly, you dive-deep referring to relevant articles on the dominant traffic causing this behavior.

Section I: Identifying AWS account and the region

  1. Sign in to the AWS Management Console and open the AWS Cost Management console
  2. In the left pane, choose Cost Explorer
  3. In the right pane, under Report Parameters > Time, update Date Range = Past: 3 Months (preferably 3 months to date), and Granularity = Daily
  4. Set Dimension = Linked account under Group by
  5. Set Usage type = NatGateway-Bytes (GB) under Filters
  6. Note the AWS account from Cost and usage graph or Cost and usage breakdown that shows highest usage
  7. Add AWS account as filter by setting Linked account = <from step 6> under Filters
  8. Set Dimension = Region under Group by
  9. Note the AWS Region from Cost and usage graph or Cost and usage breakdown that shows highest usage
  10. Add Region as filter by setting Region = <from step 9> under Filters
  11. Note the Total Usage in GB as X from Cost and usage breakdown

Section II: Identifying dominant traffic - Internet or AWS bound

  1. Continue from step 11 from section I, and click Clear next to Usage type under Filters
  2. Type "datatransfer-in-bytes" as a value for Usage type, and select all the values. This should typically show -DataTransfer-In-Bytes (GB) values.
  3. Type "datatransfer-out-bytes" as a value for Usage type, and select all the values. This should typically show -DataTransfer-Out-Bytes (GB) values.
  4. Set Service = EC2-Instances (Elastic Compute Cloud - Compute) under Filters. NAT Gateway in this case is billed under EC2 instance as a service.
  5. Set Dimension = Usage type under Group by
  6. Note the Total Usage in GB as Y from Cost and usage breakdown

Note: Repeat Section I and II to understand the pattern by narrowing down to last 7 days or suspected dates, and/or go to hourly granularity.

Section III: Diving deep and next steps

Based on X and Y data aggregated monthly, daily or hourly, you may like to take one of the following routes to dive deep further:

  1. If X (NatGateway-Bytes) is nearly equal to Y (sum of DataTransfer-In-Bytes and DataTransfer-Out-Bytes) or shows a pattern match, it means the NAT Gateway processing charges are driven by internet bound traffic. Review your applications running in the VPC's private network to see why they need to pull in our push out data to the internet. If the dominant direction of internet bound traffic is contributed by DataTransfer-In-Bytes, see if it is by design, or some application fault, that is continuously downloading data from public sites. One such example could be a containerized application crashing causing a docker image being pulled on a frequent basis from the public docker hub. If it is by design, check the feasibility to securely host such application as a public endpoint. To get the exact source and destination of such traffic, I then recommend you to analyze using VPC flow logs.

  2. If X (NatGateway-Bytes) is significantly higher than Y (sum of DataTransfer-In-Bytes and DataTransfer-Out-Bytes) or doesn't show any pattern match, it means the NAT Gateway processing charges are driven by AWS bound traffic. Typically, such traffic is a result of uploading or downloading data to or from Amazon S3 bucket respectively without using S3 Gateway Endpoint, or data transfer to or from AWS public endpoint with VPC interface endpoints, or data transfer to or from your own application in another VPC in AWS public network. To get the exact source and destination of such traffic, I then recommend you to analyze using VPC flow logs.