Skip to content

How do I troubleshoot issues with my Amazon EFS volume mounts in Amazon EKS?

7 minute read
1

I receive an error when I try to mount my Amazon Elastic File System (Amazon EFS) volumes in my Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

Resolution

Prerequisites:

  • Make sure that your Amazon EFS file system has a mount target in each of the worker node subnets for the Availability Zone.
  • Confirm that you use efs.csi.aws.com for the Amazon EFS storage class definition.
  • Verify that you use PersistentVolumeClaim and PersistentVolume.
    Note: If you use dynamic provisioning, then you don't need to use PersistentVolumeClaim and PersistentVolume.
  • Make sure that you installed the Amazon EFS CSI driver add-on in the EKS cluster.

Verify that you correctly configured the network from your EKS worker nodes to Amazon EFS API

Make sure that you have access to the Amazon EFS API from the EFS worker nodes and EFS controller pods.

If you don't configure the network to reach the Amazon EFS API, then you might receive one of the following error messages:

  • "failed to provision volume with StorageClass "xxxx": rpc error: code = DeadlineExceeded desc = context deadline exceeded"
  • "Could not start amazon-efs-mount-watchdog, unrecognized init system "bash" Mount attempt x/3 failed due to timeout after 15 sec"
  • "Unable to attach or mount volumes: timed out waiting for the condition"

If you use a private cluster with no outbound internet access, then you must include the com.amazonaws.region.elasticfilesystem virtual private cloud (VPC) endpoint in your VPC. Create an inbound rule for the VPC endpoint's security group that allows traffic to port 443 from your worker nodes and pods subnets. Confirm that the policy that's attached to the VPC endpoint has the required permissions.

Verify that you correctly configured the Amazon EFS mount targets

Make sure that you created the Amazon EFS mount targets in each Availability Zone where the EKS nodes run. For example, if you distributed your worker nodes across us-east-1a and us-east-1b, then create mount targets in both Availability Zones for the EFS file system that you want to mount.

If you don't correctly configure the mount targets, then you might receive the following error message:

"Output: Failed to resolve "fs-xxxxxx.efs.us-east-1.amazonaws.com" - The file system mount target ip address cannot be found"

Verify that the security group associated with your Amazon EFS file system and worker nodes allows NFS traffic

If the security group doesn't allow traffic, then you might receive on of the following error messages:

  • "Could not start amazon-efs-mount-watchdog, unrecognized init system "bash" Mount attempt x/3 failed due to timeout after 15 sec"
  • "failed to provision volume with StorageClass "xxxx": rpc error: code = DeadlineExceeded desc = context deadline exceeded"
  • "Unable to attach or mount volumes: timed out waiting for the condition"

The security group of the Amazon EFS file system must have an inbound rule that allows network file system (NFS) traffic from the Classless Inter-Domain Routing (CIDR) for your cluster's VPC. Allow port 2049 for inbound traffic.

The security group that's associated with the worker nodes where the pods fail to mount the EFS volume must have an outbound rule. The outbound rule must allow the NFS traffic from port 2049 to the EFS file system.

Verify that you created the subdirectory path in your Amazon EFS file system

If you add subdirectory paths in persistent volumes, then the Amazon EFS CSI driver doesn't create the subdirectory path in the file system. You must already have the subdirectory in the file system for the mount operation to succeed. If the subdirectory isn't in the file system, then you might receive the following error message:

"Output: mount.nfs4: mounting fs-18xxxxxx.efs.us-east-1.amazonaws.com:/path-in-dir:/ failed, reason given by server: No such file or directory"

To verify whether the subdirectory exists in the EFS file system, mount the EFS file system on an EC2 instance and list its contents. If the subdirectory doesn't exist, use the mkdir command to create it.

Confirm that the cluster's VPC uses the Amazon DNS server

When you mount the EFS volume with the Amazon EFS CSI driver, you must use the Amazon DNS server for the VPC.

Note: Only the Amazon provided DNS can resolve the Amazon EFS service's file system DNS.

To verify the DNS server, log in to the worker node, and then run the following command:

nslookup fs-4fxxxxxx.efs.region.amazonaws.com AMAZON_PROVIDED_DNS_IP

Note: Replace region with your AWS Region. Replace AMAZON_PROVIDED_DNS_IP with your DNS IP address.

If the custom DNS server doesn't forward the requests, then you might receive the following error message:

"Output: Failed to resolve "fs-xxxxxx.efs.us-west-2.amazonaws.com" - The file system mount target ip address cannot be found"

If the cluster VPC uses a custom DNS server, then configure this DNS server to forward all *.amazonaws.com requests to the Amazon DNS server.

Verify that you have iam mount options in the PersistentVolume definition when you use a restrictive file system policy

If you don't add the iam mount option with a restrictive file system policy, then the pods fail with the following error message:

"mount.nfs4: access denied by server while mounting 127.0.0.1:/"

If you configured the Amazon EFS file system to restrict mount permissions to specific AWS Identity and Access Management (IAM) roles, then use the -o iam mount. Include the spec.mountOptions property to allow the CSI driver to add the IAM mount option.

Example:

apiVersion: v1  
kind: PersistentVolume  
metadata:  
  name: efs-pv1  
spec:  
  mountOptions:  
    - iam

Verify that you annotated the Amazon EFS CSI driver controller service account with the correct IAM role that has the required permissions

To verify that the service account that the efs-csi-controller pods use has the correct annotation, run the following command:

kubectl describe sa efs-csi-controller-sa -n kube-system

Example output:

eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/AmazonEKS_EFS_CSI_DriverRole

To confirm that the account has the correct AWS Identity and Access Management (IAM) roles and permissions, verify the IAM OIDC provider for the cluster. Check that the IAM role that's associated with efs-csi-controller-sa service account has the required permissions to perform EFS API calls. Then, verify that the IAM role's trust policy trusts the service account efs-csi-controller-sa.

Example IAM role trust policy:

{  
  "Version": "2012-10-17",  
  "Statement": [  
    {  
      "Effect": "Allow",  
      "Principal": {  
        "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"  
      },  
      "Action": "sts:AssumeRoleWithWebIdentity",  
      "Condition": {  
        "StringLike": {  
          "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:kube-system:efs-csi-*",  
          "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:aud": "sts.amazonaws.com"  
        }  
      }  
    }  
  ]  
}

Verify that the EFS CSI driver pods are running

Run the following command to verify that these pods are active in your cluster:

kubectl get all -l app.kubernetes.io/name=aws-efs-csi-driver -n kube-system

Check that the EFS mount operation from the EC2 worker node where the pod fails to mount the file system

Log in to the Amazon EKS worker node of the pod. Then, use the EFS mount helper to manually mount the EFS file system to the worker node. To test the mount operation, run the following command:

sudo mount -t efs -o tls file-system-dns-name efs-mount-point/

If the worker node can mount the file system, then review the efs-plugin logs from the CSI controller and CSI node pods.

Check that the EFS CSI driver pod logs to determine the cause of the mount failures

If the volume fails to mount, then review the efs-plugin logs. To retrieve the efs-plugin container logs, run the following commands:

kubectl logs deployment/efs-csi-controller -n kube-system -c efs-plugin  
kubectl logs daemonset/efs-csi-node -n kube-system -c efs-plugin
AWS OFFICIALUpdated 10 months ago
6 Comments

EFS Mount can also fail with the below error message when EFS security group does not have ingress rule for EKS nodes.

Mounting arguments: -t efs -o tls <EFS ID>:/ /var/lib/kubelet/pods/255dedb6-8295-4bae-a744-870c3587980f/volumes/kubernetes.iocsi/efs-pv/mount Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "aws-efs-csi-dri" Mount attempt 1/3 failed due to timeout after 15 sec, wait 0 sec before next attempt. Mount attempt 2/3 failed due to timeout after 15 sec, wait 0 sec before next attempt. b'mount.nfs4: mount point /var/lib/kubelet/pods/255dedb6-8295-4bae-a744-870c3587980f/volumes/kubernetes.iocsi/efs-pv/mount does not exist'

Can we please update the article to cover this error also ?

AWS
replied 3 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
EXPERT
replied 3 years ago

Why is it never mentioned that the EFS > Network > Security Group must be the SAME security group as the Worker Nodes.

It does not suffice to just have an All Traffic Inbound/Outbound rules for the Security Group EFS is in, and the Security Group your worker nodes is in.

I was able to bind a PVC to the PV without this, but it was then unable to mount it into a pod until I changed my EFS to be in the SAME security group.

The only documentation I have seen mentioning this anywhere was this great how to: https://www.youtube.com/watch?v=Sj0MVk0jM_4&ab_channel=AntonPutra

Please add this to the docs somewhere as I'm not the only one this has happened to. Thank you.

replied 2 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
EXPERT
replied 2 years ago

Andrew - Thank you, the security groups not matching by ids but rather by rules has turned out to be exactly my issue.

replied 2 years ago

To follow up on Andrew's very useful comment re "EFS > Network > Security Group", use the SG that contains the string "ClusterSharedNodeSecurityGroup" if you created your kubernetes cluster using the eksctl utility.

replied 2 years ago