Preventing EKS Node Lease Leakage for Smoother Cluster Operation

2 minute read
Content level: Advanced
0

EKS Node Lease Leakage

In the ever-evolving realm of Kubernetes, managing resources can pose unforeseen challenges. Lately, a perplexing issue has garnered global attention among Kubernetes user - the unexplained surge in node lease objects within the 'kube-node-lease' namespace.

If you've ever scratched your head over this, read on, because we've delved into the heart of the problem and found solutions that can benefit Kubernetes users worldwide.

Understanding the Landscape:

Following are the key components involved:

  • Leases Component: In Kubernetes, leases are used for critical functions like node heartbeats and leader election. They play a crucial role in keeping the cluster in sync and functioning smoothly.

  • Garbage Collector: Kubernetes employs a Garbage Collector to clean up cluster resources, ensuring efficient resource usage and maintaining cluster health.

  • Owner References: Dependent objects have an “ownerReferences” field that establishes relationships with their parent objects. This field is pivotal for garbage collection and object adoption.

Problem Statement:

Customers running EKS cluster notices a constant increase in node lease objects. What's peculiar is that these leaked leases lack an “ OwnerReference”. This behavior is not by design but rather a result of a Kubernetes bug.

Reproduce the issue:

  1. Create an EKS cluster with worker nodes
  2. Delete the worker node object (while keeping kubelet and node running), watch as the Garbage Collector removes the lease object.

But here's the catch, kubernetes backend logic will create a new lease object without an OwnerRef. This cycle results in a spike in lease objects visible in monitoring tools (CloudWatch/DataDog/NewRelic/others)

Mitigating the Problem:

Now that we've unveiled the mystery, let's explore solutions:

  • Manual Cleanup: For those not using Karpenter, a manual cleanup approach involves removing leaked leases using the following command:
kubectl get lease -n kube-node-lease -o json | jq -r '.items | map(select(.metadata.ownerReferences == null ) | .metadata.name) | .[]' | xargs -n 10kubectl delete lease -n kube-node-lease
  • Karpenter Integration: EKS with Karpenter now includes a controller to handle node lease garbage collection, providing a solution until the upstream bug is resolved.

References:

  1. https://github.com/kubernetes/kubernetes/issues/109777
  2. https://github.com/aws/karpenter/issues/4363
  3. Karpenter controller to fix node leakage issue: https://github.com/aws/karpenter-core/pull/471

Co-Author :

  1. Sidhartha Kotha
  2. Dharmendra Singh
profile pictureAWS
EXPERT
published 6 months ago1253 views