Skip to content

How do I plan an upgrade for my Amazon EKS cluster?

8 minute read
3

I want to follow best practices when I upgrade my Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

Resolution

Understand the effect of an upgrade

New versions of Kubernetes can introduce significant changes to your Amazon EKS cluster. After you upgrade your cluster, you can't downgrade your cluster.

If you upgrade to a later Kubernetes version, then you don't need to perform an in-place cluster upgrade. Instead, you can migrate to a new cluster. If you migrate to a new cluster, then it's a best practice to use cluster backup and restore tools, such as Velero. For more information, see Velero on the GitHub website.

For current and earlier versions of Kubernetes, see the Amazon EKS Kubernetes release calendar.

Meet the upgrade requirements

To upgrade your cluster, you must meet the following requirements:

Review major updates for Amazon EKS and Kubernetes

Take the following actions:

  • Review all documented changes for the upgrade version, and then note the required upgrade steps.
  • Review requirements or procedures that are specific to Amazon EKS managed clusters.

For information about major updates to the platform versions of Amazon EKS clusters and Kubernetes versions, see the following documentation:

For information about Kubernetes upstream versions and major updates, see the following documentation:

Understand and check for discontinued APIs

When Kubernetes upgrades an API, Kubernetes also removes the previous API.

To manage discontinued APIs, take the following actions:

  • To understand how Kubernetes discontinues APIs in a later version of Kubernetes, see the Kubernetes deprecation policy on the Kubernetes website.
  • To check for discontinued API versions in your cluster, use the Kube No Trouble (kubent) tool. For more information, see kube-no-trouble on the GitHub website. If you use discontinued API versions, then upgrade your workloads before you upgrade your Kubernetes cluster.
  • To convert Kubernetes manifest files between different API versions, use the kubectl convert plugin. For more information, see Install kubectl convert plugin on the Kubernetes website.

Understand what to expect during a Kubernetes upgrade

If you upgrade your cluster, then Amazon EKS launches new API server nodes with the upgraded Kubernetes version to replace the existing nodes. Amazon EKS also runs a series of internal checks. If a check fails, then Amazon EKS rolls back the infrastructure deployment and your cluster remains on the previous Kubernetes version. The rollback doesn't affect running applications, and you can recover the clusters.

Note: During the upgrade process, you might experience minor service interruptions.

Upgrade the control plane and data plane

To upgrade a cluster, you must update the control plane and the data plane.

To update the planes, choose one of the following methods:

  • In-place cluster upgrade
  • Blue/green or canary upgrade
  • Managed node group upgrade

In-place cluster upgrade

Important: For in-place upgrades, you can upgrade only to the next latest minor version of Kubernetes. If there are multiple versions between your current cluster version and the destination version, then you must sequentially upgrade to each version.

For each in-place Kubernetes cluster upgrade, complete the following steps:

  1. Update your Kubernetes manifests and the discontinued APIs.
  2. Upgrade the cluster control plane.
  3. Upgrade the nodes in your cluster.
  4. Update your Kubernetes add-ons and custom controllers.

For more information, see the Planning and executing Kubernetes version upgrades in Amazon EKS section in Planning Kubernetes upgrades with Amazon EKS. Also, see Best practices for cluster upgrades.

Blue/green or canary upgrade

To complete a blue/green or canary upgrade, see Blue/green or canary Amazon EKS clusters migration for stateless ArgoCD workloads.

Managed node group upgrade

Important: A node's kubelet can't be later than kube-apiserver. Also, the kubelet can't be more than two minor versions earlier than kube-apiserver. For example, if kube-apiserver is at version 1.24, then Amazon EKS supports a kubelet only at versions 1.24, 1.23, and 1.22.

To upgrade a managed node group, update your nodes in the managed node group.

Migrate to an Amazon EKS managed node group

If you use self-managed node groups, then you can migrate your workload to Amazon EKS managed node groups with no downtime.

Identify and upgrade downstream dependencies

Clusters can contain third-party add-ons and tools, such as ingress controllers, continuous delivery systems, monitoring tools, and other workflows. If you update your cluster, then you must also update your add-ons and third-party tools. Make sure that you understand how add-ons work with your cluster and how add-ons are updated.

Note: It's a best practice to use managed add-ons instead of self-managed add-ons.

Review the following examples of common add-ons and upgrade documentation:

Upgrade Fargate nodes

To update an AWS Fargate node, complete the following steps:

  1. Delete the Pod that your node represents.
  2. Update your control plane.
  3. Redeploy the Pod.

New Pods that you launch on Fargate now have a kubelet version that matches your cluster version. Your upgrade doesn't affect existing Fargate Pods.

Note: Amazon EKS must periodically patch Fargate Pods to keep the Pods secure. When it's updating the Pods, Amazon EKS minimizes the effects of the patch. If Amazon EKS can't remove Pods, then Amazon EKS deletes the Pods. To minimize disruption, see Set actions for AWS Fargate OS patching events.

Upgrade unmanaged nodes that Karpenter creates

The value that you set for ttlSecondsUntilExpired activates node expiration. After nodes reach the specified age in seconds, Amazon EKS deletes the nodes. The deletion occurs even when Amazon EKS workloads or applications use the nodes. Use ttlSecondsUntilExpired to replace nodes with newly provisioned instances so that you can upgrade the instances. Karpenter uses the latest Amazon EKS optimized Amazon Machine Images (AMIs) for replaced nodes. For more information about how Karpenter disrupts nodes, see Disruption on the Karpenter website.

The following example shows a node that's deprovisioned with ttlSecondsUntilExpired and then replaced with an upgraded instance:

apiVersion: karpenter.sh/v1alpha5kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.sh/capacity-type         # optional, set to on-demand by default, spot if both are listed
      operator: In
      values: ["spot"]
  limits:
    resources:
      cpu: 1000                               # optional, recommended to limit total provisioned CPUs
      memory: 1000Gi
  ttlSecondsAfterEmpty: 30                    # optional, but never scales down if not set
  ttlSecondsUntilExpired: 2592000             # optional, nodes are recycled after 30 days but never expires if not set
  provider:
        subnetSelector:
      karpenter.sh/discovery/CLUSTER_NAME: '*'
    securityGroupSelector:
      kubernetes.io/cluster/CLUSTER_NAME: '*'

Note: Karpenter doesn't automatically add jitter to the ttlSecondsUntilExpired value. If you create multiple instances in a short amount of time, then the instances expire at the same time. To prevent excessive workload disruption, set a Pod disruption budget. For more information, see Specifying a disruption budget for your application on the Kubernetes website.

AWS OFFICIALUpdated 9 months ago