Monitoring and Optimizing EKS Inter-AZ Data Transfer Cost

5 minute read
Content level: Intermediate
0

Guidance on optimizing and reducing cross-AZ data transfer charges in EKS clusters

EKS Architecture


When customers set up EKS Clusters on AWS using any of their preferred tools, the best practice is to deploy the Cluster's distributed components and compute infrastructure across multiple Availability Zones to ensure optimal performance, resiliency and consistent uptime (Amazon EKS Architecture). Technical Account Managers (TAMs) and Solutions Architects (SAs) emphasize the need for resiliency to customers and customers have also come to understand the importance of building resilient architectures. However, it is also important to remind customers of the cost of resilience and help customers proactively and continually identify areas of their infrastructure that may be driving huge cost due to resilience. Helping customers identify this important metric will help avoid surprises.

The EKS Cluster below is a sample cluster deployed to a VPC in a region with 3 Availability Zones.

A standard EKS cluster infrastructure with 4 nodes in a managed node group spread across 3 Availability Zones.
EKS Node-AZ Distribution

Sample EKS Workload


Using the simple EKS cluster below, pods are deployed to the nodes based on the available resources/capacity of the nodes. Since nodes are spread across multiple AZs, communication between the nodes results in across-AZ data transfer.

  • Server is deployed to 2 nodes (192.168.74.137 and 192.168.6.195) in 2 different AZs for resiliency.
  • Client is deployed to node (192.168.6.195) in a single AZ.
  • Client sends traffic to both Servers.
  • Data sent to Server on 192.168.6.195 does not generate inter-AZ traffic or cost.
  • Data sent from Client (on 192.168.6.195) to Server (on 192.168.74.137) or vice versa generates inter-AZ traffic and cost.
EKS Deployment

Inter-AZ Data Transfer and Cost


Data transfer cost between Availability Zones in the same region is $0.1 in each direction. For very chatty applications with pods spread across multiple Availability Zones, the volume of data transferred can be very high and become a major cost for customers. Identifying this early in the deployment and continually monitoring it can help customers avoid this cost.

Monitoring Inter-AZ Data Transfer


One of the resources that customers can use to proactively and continuously monitor the inter-AZ data transfer cost is amazon-eks-inter-az-traffic-visibility. The code is available in the AWS Samples GitHub repo. With this solution deployed in an EKS cluster, customers can closely monitor data transfer metrics for applications and pods running in the cluster. The solution uses AWS Athena, S3, VPC Flow Logs, AWS Lambda and AWS Step Functions to automatically and periodically collect traffic data between pods. The data can be queried using AWS Athena and visualized using AWS QuickSight.

Distribution of pods across nodes and AZNumber of bytes, source IP, destination IP, flow direction and AZsApplication/Pod specific traffic and bytes transferred
Pods tableVPC Flow Logs tableAthena Results table

What About Cost Explorer?


Cost Explorer is able to provide cost breakdown by usage type and tags (when enabled in Cost Allocation Tags), however, Cost Explorer won't provide visibility into which pods are driving the cost. Examples of data available through Cost Explorer are below.

Cost Explorer showing data transfer and cost by Usage TypeCost Explorer showing data transfer and cost by cluster-name
Data Transfer - Usage TypeData Transfer - Cluster View

Optimizing Inter-AZ Data Transfer


There is no one-size-fits-all when it comes to Kubernetes or EKS workloads so no single solution can be proposed, but with the above data, customers can build QuickSight dashboards like below to visualize the data and continuously monitor data transfer trends. This data will provide visibility into chatty applications driving high data transfer charges and help to define a more optimized approach for EKS workloads. Optimization can be achieved through several methods.

  1. One common method which can be immediately implemented is to use the Node Affinity/Anti Affinity property of Kubernetes to define Kubernetes Taints and Tolerations for chatty applications which can immediately place chatty applications on the same nodes thereby reducing high inter-AZ traffic.
  2. Another option could be to deploy a central application as a DaemonSet ensuring the pod is available on the same node to any other application pods that need to access it. This does not eliminate cross-AZ data transfer but can help reduce it.
  3. A long-term approach is to redesign the applications to reduce chatter. This requires extensive architectural review and decisions but it will help the customer to make better decisions when building or adopting applications. The Cost Optimization Pillar of the AWS Well-Architected Framework helps you understand the decisions you make while building workloads on AWS.
Data Transfer by AZ

Bytes transferred across AZs

Client to Server data transfer

Notes


Not all inter-AZ data transfer can be eliminated especially for customers that want to achieve high resiliency so this discussion should be approached with the necessary caveats and with understanding of the customer's priorities.