How do I resolve kubelet or CNI plugin issues for Amazon EKS?

6 minute read
0

I want to resolve issues with my kubelet or CNI plugin for Amazon Elastic Kubernetes Service (Amazon EKS).

Short description

To assign and run an IP address to the pod on your worker node with your CNI plugin (on the Kubernetes website), you must have the following configurations:

  • AWS Identity and Access Management (IAM) permissions, including a CNI policy that attaches to your worker node's IAM role. Or, IAM permissions that you provide through service account IAM roles.
  • An Amazon EKS API server endpoint that's reachable from the worker node.
  • Network access to API endpoints for Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Registry (Amazon ECR), and Amazon Simple Storage Service (Amazon S3).
  • Sufficient available IP addresses in your subnet.
  • A kube-proxy that runs successfully for the aws-node pod to progress into Ready status.
  • The kube-proxy version and VPC CNI version that support the Amazon EKS version.

Resolution

Verify that the aws-node pod is in Running status on each worker node

To verify that the aws-node pod is in Running status on a worker node, run the following command:

kubectl get pods -n kube-system -l k8s-app=aws-node -o wide

If the command output shows that the RESTARTS count is 0, then the aws-node pod is in Running status. Try the steps in the Verify that your subnet has sufficient free IP addresses available section.

If the command output shows that the RESTARTS count is greater than 0, then verify that the worker node can reach the API server endpoint of your Amazon EKS cluster. Run the following command:

curl -vk https://eks-api-server-endpoint-url

Verify connectivity to your Amazon EKS cluster

1.    Verify that your worker node's security group settings for Amazon EKS are configured correctly. For more information, see Amazon EKS security group requirements and considerations.

2.    Verify that your worker node's network access control list (network ACL) rules for your subnet allow communication with the Amazon EKS API server endpoint.

Important: Allow inbound and outbound traffic on port 443.

3.    Verify that the kube-proxy pod is in Running status on each worker node:

kubectl get pods -n kube-system -l k8s-app=kube-proxy -o wide

4.    Verify that your worker node can access API endpoints for Amazon EC2, Amazon ECR, and Amazon S3.

Note: Configure these services through public endpoints or AWS PrivateLink.

Verify that your subnet has sufficient free IP addresses available

To list available IP addresses in each subnet in the Amazon Virtual Private Cloud (Amazon VPC) ID, run the following command:

aws ec2 describe-subnets --filters "Name=vpc-id,Values= VPCID" | jq '.Subnets[] | .SubnetId + "=" + "\(.AvailableIpAddressCount)"'

Note: The AvailableIpAddressCount must be greater than 0 for the subnet where the pods are launched.

Check whether your security group limits have been reached

If you reach the limits of your security group's per elastic network interface, then your pod networking configuration can fail.

For more information, see Amazon VPC quotas.

Verify that you're running the latest stable version of the CNI plugin

To confirm that you have the latest version of the CNI plugin, see Updating the Amazon VPC CNI plugin for Kubernetes self-managed add-on.

For additional troubleshooting, see the AWS GitHub issues page and release notes for the CNI plugin.

Check the logs of the VPC CNI plugin on the worker node

If you create a pod, and an IP address doesn't get assigned to the container, then you receive the following error:

failed to assign an IP address to container

To check the logs, go to the /var/log/aws-routed-eni/ directory, and then locate the file names plugin.log and ipamd.log.

Verify that your kubelet pulls the Docker container images

If your kubelet doesn't pull the Docker container images for the kube-proxy and amazon-k8s-cni containers, then you receive the following error:

network plugin is not ready: cni config uninitialized

Make sure that you can reach the Amazon EKS API server endpoint from the worker node.

Verify that the WARM_PREFIX_TARGET value is set correctly

Note: This applies only if prefix delegation is turned on. If prefix delegation is turned on, then check for the following logged error message:

Error: Setting WARM_PREFIX_TARGET = 0 is not supported while WARM_IP_TARGET/MINIMUM_IP_TARGET is not set. 
Please configure either one of the WARM_{PREFIX/IP}_TARGET or MINIMUM_IP_TARGET env variable

WARM_PREFIX_TARGET must be set to a value greater than or equal to 1. If it's set to 0, then you receive the following error:

See CNI configuration variables on the GitHub website for more information.

Check the reserved space in the subnet

Note: This applies only if prefix delegation is turned on. If prefix delegation is turned on, then check for the following logged error message:

InsufficientCidrBlocks

Make sure that you have sufficient available /28 IP CIDR (16 IPs) blocks in the subnet. All 16 IPs must be contiguous. If you don't have a /28 range of continuous IPs, then you receive the InsufficientCidrBlocks error.

To resolve the error, create a new subnet, and launch the pods from there. Also, use an Amazon EC2 subnet CIDR reservation to reserve space within a subnet with an assigned prefix. For more information, see Use subnet CIDR reservations.

Updates that are made with Infrastructure as Code (IaC) roll back with conflicts

If you use Amazon EKS managed add-ons, the update errors that use the following services roll back when the conflict method is undefined:

Correct methods are NONE, OVERWRITE, or PRESERVE.

  • If no method is defined, then the default is NONE. When the system detects conflicts, the update to the CloudFormation stack rolls back, and no changes are made.
  • To set the default configuration for the add-ons, use the overwrite method. You you must use OVERWRITE when you move from self-managed to Amazon EKS managed add-ons.
  • Use the PRESERVE method when you use custom defined configurations, such as WARM_IP_TARGET or custom networking.

Nodes are in NotReady status

When you have aws-nodes that aren't in the Running status, it's common for nodes to be in the NotReady status. For more information, see How can I change the status of my nodes from NotReady or Unknown status to Ready status?

Custom networking configuration challenges

When custom networking is active for the VPC CNI, ENIConfig custom resource definitions (CRDs) must define the correct subnet and security groups.

To verify if custom networking is active, describe the aws-node pod in the kube-system namespace. Then, see if the following environment variable is set to true:

AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG

If custom networking is active, then check that the CRDs are configured properly.

kubectl get ENIConfig -A -o yaml

Describe each entry that matches the Availability Zone name. The subnet IDs match for the VPC and worker node placement. The security groups are accessible or shared with the cluster security group. For more information on best practices, see the Amazon  EKS Best Practices Guides on the GitHub website.


AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago