Skip to content

Loop dependency for deleting EKS node group

0

I cannot delete EKS cluster because I cannot delete node group, which depends on the deletion of security group, which depends on the deletion of ENIs which owned by EKS.

What's the solution to this problem?

It's a dependecy loop:

[EKS cluster] -> [Node Group] -> [Security Group] -> [ENI (managed by EKS)]

Below are the details of my situation:

1. Delete cluster

aws eks delete-cluster --name llm-k8s-cluster2 

An error occurred (ResourceInUseException) when calling the DeleteCluster operation: Cluster has nodegroups attached

2. Delete node group

aws eks delete-nodegroup --cluster-name llm-k8s-cluster --nodegroup-name gpu-l4-group-al2023 --output=json

 "health": {
            "issues": [
                {
                    "code": "AccessDenied",
                    "message": "Your worker nodes do not have access to the cluster. Verify if the node instance role is present and correctly configured in the aws-auth ConfigMap.",
                    "resourceIds": [
                        "AmazonEKSNodeGroupRole"
                    ]
                },
                {
                    "code": "Ec2SecurityGroupDeletionFailure",
                    "message": "DependencyViolation - resource has a dependent object",
                    "resourceIds": [
                        "sg-0e77ae6cda1b4b0f8"
                    ]
                }
            ]
        },

3. Security group

aws ec2 delete-security-group --group-id sg-0e77ae6cda1b4b0f8 --output=json

An error occurred (DependencyViolation) when calling the DeleteSecurityGroup operation: resource sg-0e77ae6cda1b4b0f8 has a dependent object

Associated ENIs:

aws ec2 describe-network-interfaces \
  --filters "Name=group-id,Values=sg-0e77ae6cda1b4b0f8" \
  --query "NetworkInterfaces[*].{ID:NetworkInterfaceId,PrivateIP:PrivateIpAddress,Status:Status,Attachment:Attachment.InstanceId}" \
  --output json

[
    {
        "ID": "eni-0fd2c1d13185e8d54",
        "PrivateIP": "172.31.21.146",
        "Status": "in-use",
        "Attachment": null
    },
    {
        "ID": "eni-0dbed56f2261a4bf1",
        "PrivateIP": "172.31.35.74",
        "Status": "in-use",
        "Attachment": null
    }
]

4. ENI details

aws ec2 delete-network-interface --network-interface-id eni-0fd2c1d13185e8d54

An error occurred (InvalidParameterValue) when calling the DeleteNetworkInterface operation: Network interface 'eni-0fd2c1d13185e8d54' is currently in use.

When I describe the ENI permission, it shows the owner ID is 392771493575 which must be the internal account owned by EKS, not my root account:

james@hwdell:~$ aws ec2 describe-network-interface-permissions --output=json
{
    "NetworkInterfacePermissions": [
        {
            "NetworkInterfacePermissionId": "eni-perm-051ff65effd8d55ed0",
            "NetworkInterfaceId": "eni-0fd2c1d13185e8d54",
            "AwsAccountId": "392771493575",
            "Permission": "INSTANCE-ATTACH",
            "PermissionState": {
                "State": "GRANTED"
            }
        },
        {
            "NetworkInterfacePermissionId": "eni-perm-0e33b7221a76ac3f9",
            "NetworkInterfaceId": "eni-0dbed56f2261a4bf1",
            "AwsAccountId": "392771493575",
            "Permission": "INSTANCE-ATTACH",
            "PermissionState": {
                "State": "GRANTED"
            }
        }
    ]
}
2 Answers
0
Accepted Answer

My solution:

Update the EKS cluster networking configuration to remove the *RemoveAccess security groups which are blocking the deletion of the node groups.

EKS will remove the *RemoveAccess security groups from the Network Interfaces as well, so it breaks the dependency loop.

answered 2 months ago
-1

When you encounter a dependency loop while trying to delete an EKS node group, the issue is typically related to security groups being attached to network interfaces that are managed by EKS. The solution to this problem is to remove the security group from the EFS network interface that's blocking the deletion.

Based on your situation, you're experiencing a dependency violation where the security group (sg-0e77ae6cda1b4b0f8) is attached to ENIs that are managed by an AWS-owned account (392771493575), which is indeed the EKS service account.

Here's how to resolve this issue:

  1. First, identify which EFS mount targets are using the security group. The ENIs you've identified are likely associated with an EFS file system.

  2. Remove the security group from the EFS network interface. You'll need to modify the security groups associated with your EFS mount targets rather than trying to delete the ENIs directly (which you can't do since they're managed by AWS).

  3. To do this, you can use the EFS console or AWS CLI to modify the security groups associated with your EFS mount targets, removing the problematic security group.

  4. Once the security group is no longer attached to any EFS mount targets, you should be able to delete the node group successfully using:

aws eks delete-nodegroup --cluster-name llm-k8s-cluster --nodegroup-name gpu-l4-group-al2023
  1. After the node group is deleted, you can proceed to delete the cluster:
aws eks delete-cluster --name llm-k8s-cluster2

This approach addresses the dependency loop by breaking it at the security group attachment point, allowing the deletion process to proceed normally.
Sources
Delete EKS Node Group failed due to Security Group and ENI Dependency | AWS re:Post
Use DeleteNodegroup with a CLI - AWS SDK Code Examples
Amazon EKS examples using AWS CLI - AWS SDK Code Examples

answered 2 months ago
  • I didn't find any EFS related resources with both AWS console and CLI.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.