AWS EKS nodeGroup stuck at 'deleting' status

0

I am using terraform to deploy the AWS infrastructure. I stuck at EKS nodeGroup is deleting status . The health shows that IAM role is not found, I created that role again and tries to update nodeGroup from CLI and it says

"An error occurred (ResourceInUseException) when calling the UpdateNodegroupVersion operation: Nodegroup cannot be updated as it is currently not in Active State"

I am unable to delete cluster and nodeGroup from the CLI and from the terraform as well. In my terraform script nodeGroup have instance type is (SPOT - t3.medium) but when I checked nodeGroup from CLI I saw that the instance is ( on-demand t3.large). There is no such config in my terraform code.

main.tf: 
module "vpc" {
  source = "./modules/vpc"
  cluster_name = var.cluster_name
}

module "eks" {
  source = "./modules/eks"
  cluster_name = var.cluster_name
  private_subnet_ids = module.vpc.private_subnet_ids
  subnet_ids = module.vpc.subnet_ids
}


module "karpenter" {
  source = "./modules/karpenter"
  eks-nodeName = module.eks.eks-data
  eks-connect-provider-url = module.eks.aws_iam_openid_connect_provider
  eks-connect-provider-arn = module.eks.aws_iam_openid_connect_provider-arn
  cluster_name = module.eks.cluster_name
  private-nodes = module.eks.aws_eks_node_group-private-nodes
}

nodes.tf
resource "aws_eks_node_group" "private-nodes" {
  # count            = var.delete_nodegroup ? 1 : 0
  cluster_name = aws_eks_cluster.cluster.name
  node_group_name = "private-nodes"
  node_role_arn = aws_iam_role.nodes.arn
  
  # subnet_ids = [ 
  #   aws_subnet.private-eu-west-1a.id,
  #   aws_subnet.private-eu-west-1b.id
  #  ]

  subnet_ids = var.private_subnet_ids

    capacity_type  = "SPOT"
    instance_types = ["t3.medium"]

   scaling_config {
     desired_size = 1
     max_size = 6
     min_size = 1
   }

   update_config {
    max_unavailable = 1
   }

   labels = {
     role = "general"
   }

  
depends_on = [
    aws_iam_role_policy_attachment.nodes-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.nodes-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.nodes-AmazonEC2ContainerRegistryReadOnly,
  ]

  # Allow external changes without Terraform plan difference
  lifecycle {
    ignore_changes = [scaling_config[0].desired_size]
  }

used this command to delete the nodeGroup aws eks delete-nodegroup --cluster-name eks-gd360 --nodegroup-name private-nodes --region eu-west-1

I have followed these docs: https://docs.aws.amazon.com/eks/latest/userguide/delete-managed-node-group.html

deleting EKS cluster https://docs.aws.amazon.com/eks/latest/userguide/delete-cluster.html#w237aac13c27b9b3

2 Answers
1
Accepted Answer

I engaged with the AWS support billing team to address this issue, Initially, they recommended purchasing AWS Developer Support to access developer assistance for the problem at hand. I expressed my belief that the issue is AWS internal, and they agreed to investigate further.

After a period of 4 to 5 days, the support team took action by deleting the NodeGroup, which allowed me to delete the EKS cluster.

I asked them a technical reason and this is what they replied:

This is an internal issue at the AWS end and this can occur due to various reasons. However, they've suggested you to wait till the current request is completed before you can submit next action. For example: When you create a Cluster wait for it to be successfully complete before initiating the creation of Node groups. If we avoid overlapping the requests while one is in progress we can avoid this error from repeating again.

I destroyed resources by terraform destroy, maybe issue occurred from terraform.

Bilal
answered a month ago
profile picture
EXPERT
reviewed a month ago
0

I would check if there are manual changes to the deploy so that terraform plan gets misaligned hence delete get stuck?

profile picture
EXPERT
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions