Unusually high network traffic of aws-node pods

0

On a high level we monitor network traffic in our EKS clusters through the container_network_transmit_bytes_total and container_network_receive_bytes_total metrics provided by cadvisor. Recently, while investigating network usage and trying to reduce the costs for NatGateway-Bytes and DataTransfer-Regional-Bytes, I stumbled upon unusually high network usage of the aws-node and kube-proxy pods. Looking at the utilization, the following queries

sum (increase(container_network_transmit_bytes_total{pod=~"aws-node.*",cluster="prod"}[24h])) / 1024^3
sum (increase(container_network_transmit_bytes_total{pod=~"kube-proxy.*",cluster="prod"}[24h])) / 1024^3

both return the same usage (up to GB precision) of 17818 GB/day. Compared to the residual traffic of the cluster

sum (increase(container_network_transmit_bytes_total{pod!~"kube-proxy.*|aws-node.*",cluster="prod"}[24h])) / 1024^3

which returns 9817 GB/day, this seems unusually high. I could not find reasons that would justify these numbers online. To my understanding kube-proxy just creates rules in the iptables of the nodes to forward packets to the correct pods/services, but from these findings it seems to me that the packets are actually routed through kube-proxy? Is there any way to debug this further, or can somebody please enlighten me about this high network usage of aws-node and kube-proxy? Also what is the reason that these two pods report almost identical network usage?

Best, Sam

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions