How do I troubleshoot issues when I integrate Secrets Manager with Amazon EKS?

5 minute read
0

When I try to integrate AWS Secrets Manager with Amazon Elastic Kubernetes Service (Amazon EKS), I get an error.

Short description

If your pods don't enter into the Running state, then you get an error when you're integrating Secrets Manager with Amazon EKS. To resolve this issue, check the logs from the Secrets Store Container Storage Interface (CSI) Driver pods to determine the pods that aren't performing.

Resolution

To display the Secrets Store CSI Driver pods, run the following command:

kubectl --namespace=kube-system get pods -l "app=secrets-store-csi-driver"

To display the logs from the Secrets Store CSI pods, run the following command:

kubectl --namespace=kube-system logs -f -l "app=secrets-store-csi-driver"

The following logs show that each pod is performing well:

I1120 20:21:19.135834       1 secrets-store.go:74] Driver: secrets-store.csi.k8s.io
I1120 20:21:19.135857       1 secrets-store.go:75] Version: v0.2.0, BuildTime: 2021-08-12-18:55  
I1120 20:21:19.135868       1 secrets-store.go:76] Provider Volume Path: /etc/kubernetes/secrets-store-csi-providers  
I1120 20:21:19.135874       1 secrets-store.go:77] GRPC supported providers will be dynamically created  
I1120 20:21:19.135895       1 driver.go:80] "Enabling controller service capability" capability="CREATE_DELETE_VOLUME"  
I1120 20:21:19.135912       1 driver.go:90] "Enabling volume access mode" mode="SINGLE_NODE_READER_ONLY"  
I1120 20:21:19.135922       1 driver.go:90] "Enabling volume access mode" mode="MULTI_NODE_READER_ONLY"  
I1120 20:21:19.135938       1 main.go:172] starting manager  
I1120 20:21:19.136210       1 server.go:111] Listening for connections on address: //csi/csi.sock  
I1120 20:21:18.956092       1 exporter.go:33] metrics backend: prometheus

Note: Pods that perform the same actions appear as duplicate entries.

If the SecretProviderClass in the volumeMount doesn't exist in the same namespace as the pod, then you receive the following error:

"Warning FailedMount 3s (x4 over 6s) kubelet, kind-control-plane MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to get secretproviderclass default/aws, error: secretproviderclasses.secrets-store.csi.x-k8s.io "aws" not found"

The SecretProviderClass must exist in the same namespace as the pod.

The Secrets Store CSI Driver is deployed as a daemonset. If the CSI Driver pods aren't running on the node, then you receive the following error:

"Warning FailedMount 1s (x4 over 4s) kubelet, kind-control-plane MountVolume.SetUp failed for volume "secrets-store-inline" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers"

If the node is tainted, then add a toleration for the taint in the Secrets Store CSI Driver daemonset.

Check if there are any node selectors that don't allow the Secrets Store CSI Driver pods to run on the node:

kubectl --namespace=kube-system describe pods -l "app=secrets-store-csi-driver" | grep Node-Selectors*

Get the labels that are associated to the worker nodes in your pod:

kubectl get node --selector=kubernetes.io/os=linux

Compare the outputs from the preceding commands. Make sure that the labels match the node selector values.

Check if the CSI Driver was deployed to the cluster and if all pods are in the Running state:

kubectl get pods -l app=secrets-store-csi-driver -n kube-system

-or-

kubectl get daemonset csi-secrets-store-secrets-store-csi-driver -n kube-system

Example output:

kubectl get csidriver  
NAME                       ATTACHREQUIRED   PODINFOONMOUNT   MODES       AGE  
secrets-store.csi.k8s.io   false            true             Ephemeral   110m

The preceding output shows that the driver was deployed to the cluster. If you don't find secrets-store.csi.k8s.io, then reinstall the driver.

The secrets-store-csi-driver-provider-aws pod is deployed as a daemonset. If the pod isn't running in the worker node where your application pod is trying to launch, then you receive the following error:

"MountVolume.SetUp failed for volume "volumename" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod namespace/pod, err: error connecting to provider "aws": provider not found: provider "aws""

To check the number of secrets-store-csi-driver-provider-aws pods that are running in the cluster and the number of nodes, run the following two commands:

kubectl get ds csi-secrets-store-provider-aws -n kube-system
kubectl get nodes

If the daemonset's desiredNumberScheduled parameter is less than the number of nodes in the cluster, then check for the deamonset's Tolerations parameter:

kubectl describe ds csi-secrets-store-provider-aws -n kube-system

In the command's output, look for Tolerations: op=Exists. If you don't find Tolerations: op=Exists, then this might be why the secrets-store-csi-driver-provider-aws daemonset pod didn't run on that node. To resolve this issue, add the toleration. For more information, see Taints and tolerations on the Kubernetes website.

If the files that the SecretProviderClass pulled in are larger than 4 mebibytes (MiB), then you might get FailedMount warnings. The error message includes grpc: received message larger than max. To accept responses larger than larger than 4 MiB, specify --max-call-recv-msg-size=size in bytes to the Secrets Store Container in the csi-secrets-store daemonset.

Note: Replace size in bytes with the size that you want the driver to accept. Because larger responses can increase the memory resource consumption of the secrets-store container, you might need to increase your memory limit. If you still have issues, then review the log events in chronological order to see if there were any other failures:

kubectl get events -n kube-system --sort-by='.metadata.creationTimestamp'
AWS OFFICIAL
AWS OFFICIALUpdated 24 days ago
2 Comments

MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod pwsa0000809/nginx-deployment-986cdc8bd-vc7qd, err: error connecting to provider "aws": provider not found: provider "aws"

what about this error

AWS
replied 2 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 2 months ago