Skip to content

How do I resolve Amazon VPC CNI plugin issues for Amazon EKS?

8 minute read
0

I want to resolve issues with the Amazon Virtual Private Cloud (Amazon VPC) Container Network Interface (CNI) plugin for Amazon Elastic Kubernetes Service (Amazon EKS).

Resolution

Check your permissions and configurations

For the Amazon VPC CNI plugin to work, you must assign the appropriate Pod IP address.

Check that you have the following permissions and configurations:

  • AWS Identity and Access Management (IAM) permissions, including AmazonEKS_CNI_Policy that attaches to your worker node's IAM roles. Or, IAM permissions that you provide through IAM roles for service accounts (IRSA).
  • An Amazon EKS API server endpoint that's reachable from the worker node.
  • Network access to API endpoints for Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Registry (Amazon ECR), and Amazon Simple Storage Service (Amazon S3).
  • Sufficient available IP addresses in your subnets.
  • A kube-proxy that runs successfully for the aws-node Pod to progress into Ready status.
  • A kube-proxy version and VPC CNI version that aligns with the Amazon EKS cluster version.

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, confirm that you're using the most recent AWS CLI version.

Check the aws-node Pod status and logs

The VPC CNI plugin runs as a DaemonSet Pod that's called aws-node in the kube-system namespace. One aws-node Pod runs on each worker node in your cluster.

To verify that the aws-node Pod is running on each worker node, run the following kubectl get command:

kubectl get pods -n kube-system -l k8s-app=aws-node -o wide

Note: Replace k8s-app=aws-node with your label selector. For information about the get command, see kubectl get on the Kubernetes website.

If the command output shows that the RESTARTS count is greater than 0, then verify the status of containers and error messages. Run the following kubectl describe command:

kubectl describe pod aws-node-pod-name -n kube-system

Note: Replace aws-node-pod-name with the name of your AWS node Pod. For information about the describe command, see kubectl describe on the Kubernetes website.

If the aws-node Pod is stuck in the ContainerCreating state, then the describe command's output might show the following error message:

"Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"

To resolve this issue, see Why is my Amazon EKS Pod stuck in the ContainerCreating state with the "Failed to create pod sandbox" error?

To view the logs of the aws-node Pod and check for errors, run one of the following kubectl logs commands:

kubectl logs daemonset/aws-node -n kube-system

-or-

kubectl logs aws-node-pod-name -n kube-system

Note: Replace aws-node-pod-name with your aws-node Pod name. For information about the logs command, see kubectl logs on the Kubernetes website.

To check for errors in the VPC CNI plugin logs on the worker node, sign in to the worker node, and then go to the /var/log/aws-routed-eni/ directory. Locate the file names plugin.log and ipamd.log.

To verify that the worker nodes can reach the API server endpoint of your Amazon EKS cluster, use SSH or Session Manager, a capability of AWS Systems Manager, to log in to your worker node.

Then, run the following command:

curl -ivk https://eks-api-server-endpoint-url

Note: Replace eks-api-server-endpoint-url with your Amazon EKS API server endpoint URL.

Check the VPC CNI version

Check that your VPC CNI plugin version is up to date and compatible with your Amazon EKS cluster version.

To check the current VPC CNI version, run the following command:

kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2

Compare your current version with the latest available version in the Amazon VPC CNI plugin releases on the GitHub website.

If your version is outdated, then update the VPC CNI plugin to the latest version.

Verify network configuration and requirements

Check that you correctly configured your Amazon EKS cluster's security group and subnet configurations.

Verify security group rules

Security groups must allow connectivity between the control plane and data plane.

If you use a custom security group for the worker nodes, then check the ports. The minimum node group rules allow port 10250 inbound from the control plane security group and 443 outbound to the control plane security group. For more information, see Network security.

If the Pod security group feature is active, then check whether you reached the security group quota. If you reach your security group's per elastic network interface quota, then your Pod networking configuration might fail.

For more information about required security group rules, see View Amazon EKS security group requirements for clusters.

Verify subnet configuration

To list the available IP addresses in each subnet in the VPC, run the following describe-subnets AWS CLI command:

aws ec2 describe-subnets --filters "Name=vpc-id,Values=VPCID" | jq '.Subnets[] | .SubnetId + "=" + "\(.AvailableIpAddressCount)"'

Note: Replace VPCID with your VPC ID.

Verify that your worker node's network access control list (network ACL) rules for your subnets allow communication with the Amazon EKS API server. For more information about how to configure network ACLs, see Control subnet traffic with network access control lists.

Verify that your control plane subnets have sufficient IP addresses available. Each control plane subnet must have at least six IP addresses that Amazon EKS can use. It's a best practice to configure at least 16 IP addresses for each subnet. The AvailableIpAddressCount must be greater than 0 for the subnet where you launch the Pods. For more information, see Subnet requirements and considerations.

Confirm that the kube-proxy Pod is in the Running state

The kube-proxy Pod must be running for network connectivity.

To verify that kube-proxy is running, run the following command:

kubectl get pods -n kube-system -l k8s-app=kube-proxy

If kube-proxy isn't running, then check the Pod logs for errors:

kubectl logs -n kube-system POD-NAME

Note: Replace POD-NAME with your kube-proxy Pod name.

Verify the value for WARM_PREFIX_TARGET

If you turned on prefix delegation, then check the log file for the following error message:

"Error: Setting WARM_PREFIX_TARGET = 0 is not supported while WARM_IP_TARGET/MINIMUM_IP_TARGET is not set. Please configure either one of the WARM_{PREFIX/IP}_TARGET or MINIMUM_IP_TARGET env variable"

To resolve this issue, set WARM_PREFIX_TARGET to a value that's greater than or equal to 1. If you turned on prefix delegation and the WARM_PREFIX_TARGET value is 0, then run the following command to update the value to at least 1:

kubectl set env daemonset aws-node -n kube-system WARM_PREFIX_TARGET=1

Check the reserved space in the subnet

After you turn on prefix delegation, verify that your subnets have sufficient /28 IP CIDR blocks available. All 16 IP addresses must be contiguous.

Also, check the log file for the following error message:

"InsufficientCidrBlocks"

To resolve this issue, create a new subnet, and then launch the Pods from the new subnet. Use an Amazon EC2 subnet CIDR reservation to reserve space within a subnet with an assigned prefix.

Verify the custom network configuration

To determine if custom networking is active for your Amazon EKS cluster, run the following command:

kubectl describe pod -n kube-system $(kubectl get pods -n kube-system -l k8s-app=aws-node -o jsonpath='{.items.metadata.name}')

If this variable is set to True, then custom networking is active.

If custom networking is active, then you must configure the ENIConfig CRDs to match the cluster's networking requirements.

Run the following command to retrieve a list of all ENIConfig CRDs:

kubectl get ENIConfig -A -o yaml

To describe a specific ENIConfig, run the following command:

kubectl describe ENIConfig eni-config-name

Note: Replace eni-config-name with your ENIConfig name.

Verify that each ENIConfig has the correct subnet and security group configuration for each Availability Zone.

Confirm that the subnet that's specified in the ENIConfig matches the Availability Zone of your worker nodes.

For more information about custom networking, see Deploy Pods in alternate subnets with custom networking.

Configure conflict resolution to prevent rollbacks

When you use AWS Cloud Development Kit (AWS CDK), AWS CloudFormation, or eksctl with managed add-ons, define a conflict resolution method to prevent rollbacks.

Choose one of the following methods:

  • If you don't define a method, then the default is NONE. When the system detects conflicts, the update to the CloudFormation stack rolls back, and no changes are made.
  • To set the default configuration for the add-ons, use the overwrite method. You must use OVERWRITE when you move from self-managed to Amazon EKS-managed add-ons.
  • Use the PRESERVE method when you use custom-defined configurations, such as WARM_IP_TARGET or custom networking.

To configure conflict resolution methods, see Update add-on (AWS Console).

AWS OFFICIALUpdated 10 days ago