How do I clean up EKS Anywhere resources without eksctl when cluster creation fails?

4 minute read
1

My Amazon Elastic Kubernetes Service (Amazon EKS) Anywhere cluster failed the creation process, and I want to manually clean up my resources.

Short description

When you try to create an Amazon EKS Anywhere cluster, the cluster creation might fail for various reasons. An unfinished cluster creation might keep some unwanted resources on your machine. If you can't use eksctl to remove these resources, then you can remove them manually.

When you create an EKS Anywhere cluster, the process also creates a bootstrap cluster on the administrative machine. This bootstrap cluster is a Kubernetes in Docker (KinD) cluster, and it facilitates the creation of the EKS Anywhere cluster. To clean up this KinD cluster, stop the KinD containers and remove the KinD container images. This workflow successfully cleans up resources that use Docker as the provider. For other providers, you must complete additional steps on the virtual machines (VMs) for the control plane and nodes.

Resolution

Clean up resources on the administrative machine (for Docker)

To clean up resources that remain on the administrative machine, use the following script for all use cases. If you use Docker as your provider, then this step successfully removes all unwanted resources that remain from the failed cluster creation.

Because the kind delete cluster command requires KinD installation, this script doesn't use the command. Instead, EKS Anywhere uses KinD binaries from temporary containers to setup clusters:

EKSA_CLUSTER_NAME="YOUR_CLUSTER_NAME"

# Clean up KIND Cluster Containers  
kind_container_ids=$(docker ps -a | grep "${EKSA_CLUSTER_NAME}" | awk -F ' ' '{print $1}')  
for container_id in $kind_container_ids; do echo "deleting container with id ${container_id}"; docker rm -f ${container_id}; done

# Clean up EKS-A tools Containers  
tools_container_ids=$(docker ps -a | grep "public.ecr.aws/eks-anywhere/cli-tools" | awk -F ' ' '{print $1}')  
for container_id in $tools_container_ids; do echo "deleting container with id ${container_id}"; docker rm -f ${container_id}; done  
  
# Delete All EKS-Anywhere Images  
image_ids=$(docker image ls | grep "public.ecr.aws/eks-anywhere/" | awk -F ' ' '{print $3}')  
for image_id in $image_ids; do echo "deleting image with id ${image_id}"; docker image rm ${image_id}; done
    
# Delete Auto-generated Cluster Folder  
rm $EKSA_CLUSTER_NAME -rf

Note: Replace YOUR_CLUSTER_NAME with your EKS Anywhere cluster name.

Steps for Bare Metal, Nutanix, CloudStack and vSphere

EKS Anywhere uses Kubernetes Cluster API to provision the components of the Kubernetes cluster. If any VMs are created during the creation process and creation fails, then you must clean up these VMs manually.

You can create and manage EKS Anywhere clusters with either a separate management cluster or without a management cluster. Bare metal clusters don't support a separate management cluster. If you use Bare metal clusters, then see the Cluster without management cluster section.

To clean up your VMs, follow the relevant steps depending on your cluster setup:

Clusters without a management cluster

For clusters without a separate management cluster, power off and delete all worker nodes, etcd VMs, and the API server.

Note: VMs that are associated with Nutanix, CloudStack, and vSphere clusters commonly have their names prefixed with the cluster name.

For Bare metal clusters, power off and delete the target machines.

Clusters with a management cluster

When you use a management cluster, a separate cluster monitors your workload cluster's state. If a machine that's part of the workload cluster powers off and terminates, then the management cluster detects a health issue. Then, the cluster spins up a new virtual machine to bring the workload cluster back to the desired state.

Therefore, to clean up clusters with separate management clusters, delete the Custom Resources (CRDs) that represent the workload cluster from the management cluster. This deletes all the VMs for the particular workload cluster.

Note: In the following commands, replace WORKLOAD_CLUSTER_NAME with the workload cluster that you want to delete.
Replace MANAGEMENT_CLUSTER_FOLDER with the folder that EKS Anywhere created for the management cluster.
Replace MANAGEMENT_CLUSTER_KUBECONFIG_FILE with the kubeconfig file that the service generated for the management cluster. The kubeconfig file is in the MANAGEMENT_CLUSTER_FOLDER.

Delete the Cluster API resource for the workload cluster:

kubectl delete clusters.cluster.x-k8s.io -n eksa-system WORKLOAD_CLUSTER_NAME --kubeconfig MANAGEMENT_CLUSTER_FOLDER/MANAGEMENT_CLUSTER_KUBECONFIG_FILE

Delete the clusters.anywhere.eks.amazonaws.com resource for the cluster, if it exists:

kubectl delete clusters.anywhere.eks.amazonaws.com WORKLOAD_CLUSTER_NAME --kubeconfig MANAGEMENT_CLUSTER_FOLDER/MANAGEMENT_CLUSTER_KUBECONFIG_FILE

Note: If the cluster creation failed before the clusters.anywhere.eks.amazonaws.com resource was provisioned, then you get the following error:

"Error from server (NotFound): clusters.anywhere.eks.amazonaws.com "WORKLOAD_CLUSTER_NAME" not found"

AWS OFFICIAL
AWS OFFICIALUpdated a year ago
1 Comment

This documentation is incomplete. I followed the directions, but the new cluster failed to even start provisioning VMs when I tried to create it after following these cleanup steps. The problem is that the cleanup steps assume that your control plane creation succeeded. If it did not succeed, then you will be unable to delete any of the custom resources due to the finalizers. In that case, you need to manually delete every resource created in all the following CRDs:

clusterresourcesets.addons.cluster.x-k8s.io
etcdadmclusters.etcdcluster.cluster.x-k8s.io
etcdadmconfigs.bootstrap.cluster.x-k8s.io
kubeadmconfigs.bootstrap.cluster.x-k8s.io
kubeadmcontrolplanes.controlplane.cluster.x-k8s.io
machines.cluster.x-k8s.io
vsphereclusters.infrastructure.cluster.x-k8s.io
vspheremachines.infrastructure.cluster.x-k8s.io
vspheremachinetemplates.infrastructure.cluster.x-k8s.io
vspherevms.infrastructure.cluster.x-k8s.io

So for Example:

kubectl delete -n eksa-system vspherevms.infrastructure.cluster.x-k8s.io my-cluster-name-etcd-5mnp5

To get a quick list of all your resources that need to be deleted, you can use

for CRD in `kubectl get crd | awk -F ' ' '{print $1}'`; do echo CRD: $CRD; kubectl get $CRD -n eksa-system 2>/dev/null | grep 'my-cluster-name\|CRD'; done
MJ
replied a month ago