Unusually high network traffic of aws-node pods

0

On a high level we monitor network traffic in our EKS clusters through the container_network_transmit_bytes_total and container_network_receive_bytes_total metrics provided by cadvisor. Recently, while investigating network usage and trying to reduce the costs for NatGateway-Bytes and DataTransfer-Regional-Bytes, I stumbled upon unusually high network usage of the aws-node and kube-proxy pods. Looking at the utilization, the following queries

sum (increase(container_network_transmit_bytes_total{pod=~"aws-node.*",cluster="prod"}[24h])) / 1024^3
sum (increase(container_network_transmit_bytes_total{pod=~"kube-proxy.*",cluster="prod"}[24h])) / 1024^3

both return the same usage (up to GB precision) of 17818 GB/day. Compared to the residual traffic of the cluster

sum (increase(container_network_transmit_bytes_total{pod!~"kube-proxy.*|aws-node.*",cluster="prod"}[24h])) / 1024^3

which returns 9817 GB/day, this seems unusually high. I could not find reasons that would justify these numbers online. To my understanding kube-proxy just creates rules in the iptables of the nodes to forward packets to the correct pods/services, but from these findings it seems to me that the packets are actually routed through kube-proxy? Is there any way to debug this further, or can somebody please enlighten me about this high network usage of aws-node and kube-proxy? Also what is the reason that these two pods report almost identical network usage?

Best, Sam

No hay respuestas

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas