Skip to content

How can I resolve disk pressure on my Amazon EKS worker nodes?

4 minute read
2

I want to resolve the excessive disk pressure on my Amazon Elastic Kubernetes Service (Amazon EKS) worker nodes.

Short description

Disk pressure on Kubernetes worker nodes occurs when the available disk space drops to low levels. To mitigate disk pressure, increase the EBS volume size, or add new worker nodes to the node group to provide more disk capacity. You can also adjust Kubelet image garbage collection thresholds, configure container runtime log rotation, and manually clean up unused images to effectively manage disk usage.

It's important to monitor and manage the ephemeral storage to prevent disk pressure issues and to make sure your Kubernetes clusters correctly run.

If you have excessive disk pressure, then you might receive an error message.

For the node, you might receive an error message similar to the following:

"Warning DiskPressure ... kubelet, worker-node DiskPressure on node"

For Kubelet, you might receive an error message similar to the following:

"Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold"

To confirm the issue, run the df -h command to check the disk usage on the worker node.

Example output:

/dev/nvme0n1p1   20G   18G  2.2G  90% /       
........

Note: In the preceding example, the root filesystem /dev/nvme0n1p1 is at 90% usage. The high usage causes the disk pressure and image garbage collection failed issues.

Resolution

Increase the EBS volume size

To increase the available disk space, increase the size of the EBS volume that's attached to the worker node. For more information, see How do I increase or decrease the size of my EBS volume?

Note: Rather than resize your ephemeral volumes, it's a best practice to provision new instances with the desired disk size. Use the new instances to replace the old instance. Then, the new instances have the correct disk size and align with the immutable infrastructure mindset.

Add a new worker node to the node group

Add a new node to the node group or pool to add more disk volume to the cluster to distribute the workload across multiple nodes. This alleviates the disk pressure on any single node.

Set up custom thresholds for the Kubelet garbage collector

Configure the Kubelet to initiate image garbage collection at different disk usage thresholds. To do this, set the --image-gc-high-threshold and --image-gc-low-threshold arguments to allow for more buffer. For example, set the high threshold to 70% and the low threshold to 60% to maintain a larger buffer of free disk space. This configuration allows the Kubelet to perform image garbage collection before the disk becomes critically full.

For more information on how to configure these arguments on your worker nodes, see How do I configure Amazon EKS worker nodes to clean up the image cache at a specified percent of disk usage?

Check and adjust container runtime log rotation

Containerized applications write logs to stdout and stderr, and log files are managed by the container runtime. It's a best practice to adjust the log rotation settings for the container runtime on your worker nodes.

To configure your log rotation, for containerd, use the containerd.runc.log options in the configuration file /etc/cotainerd/config/toml. Set the log_file_max and log_file_max_size options to control the maximum number of rotated log files and the maximum size of each log file.

Clean up unused container images

If the Kubelet can't perform an image garbage collection automatically, then manually clean up unused container images on the worker node. To remove dangling or unused images and free up significant disk space, use the crictl rmi --prune command.

Manage your ephemeral storage

For long-term stability and smooth operation of your applications, it's a best practice to properly manage ephemeral storage. If you don't set limits on ephemeral storage, then a pod might consume the entire disk space on the node it runs on.

To mitigate this risk, set appropriate ephemeral storage requests and limits for your pods. Take the following actions to determine the limits:

  • To identify pod storage requirements, analyze your application's storage needs, temporary data, and any other files that are stored in the ephemeral storage. Then, you can estimate the storage requirements for each pod.
  • To set ephemeral storage requests and limits, define the minimum amount of ephemeral storage required for your pod to function correctly. Then, the Kubernetes scheduler can allocate nodes with sufficient storage resources for your pods.
    Example:
    apiVersion: v1
    kind: Pod
    metadata:
      name: my-app
    spec:
      containers:
      - name: my-app-container
        image: my-app:latest
        resources:
            requests:
                ephemeral-storage: "1Mi"
            limits:
                ephemeral-storage: "2Mi"
AWS OFFICIALUpdated 8 months ago