Questions tagged with Amazon Elastic Kubernetes Service

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

We are trying to launch a POD in EKS from MWAA. Our EKS is authenticated using aws-iam-authenticator in kube_config.yaml. But MWAA shows below error in the MWAA log kubernetes.config.config_exception.ConfigException: Invalid kube-config file. No configuration found. MWAA Environment ARN or Name: arn:axxxxxx:environment/airflow-demo Region: us-east-1 It looks like the DAG is unable to read the config file stored in S3. I am not sure whether its related to using the kube_config.yaml from S3 or using aws-iam-authenticator. We referred below writeup except the kubeconfig authentication part. https://blog.beachgeek.co.uk/working-with-amazon-eks-and-amazon-managed-workflows-for-apache-airflow-v2x/ Can someone help? Thanks --Venky
1
answers
0
votes
8
views
asked 2 days ago
Hi, We're looking for a solution to remediate the excessive IP address consumption by EKS clusters. As the enterprise CIDR ranges are limited and tend to get eaten up fast by EKS we are facing an IP shortage and overlap. We thought of having a peering between two VPCs (1 that is routable and the 2nd will be a non-routable VPC which is the by default AWS VPC). We would then have the IPs we would like to publish on the routable one... Have anyone tried that approach ? Is there an alternative solution ? Thanks in advance,
1
answers
0
votes
9
views
asked 2 days ago
A few days ago attaching EBS volumes suddenly stopped working. My EKS cluster uses ebs.csi.aws.com addon with dynamic provisioning. here is my storageClass config ``` kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: ebs-sc provisioner: ebs.csi.aws.com parameters: type: gp3 reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer ``` and volumeClaimTemplate in my sts config ``` volumeClaimTemplates: - metadata: name: log spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 1Gi ``` after sts deployment a PVC, PV and VolumeAttachment are created, however the pod is stuck in ContainerCreating state with error AttachVolume.Attach failed for volume "pvc-xxx" : rpc error: code = NotFound desc = Instance "i-xxx" not found I triple-checked, the volume is not attached to any other instance, and the instance exists. One funny thing though - when I describe the created PV I see this ``` Source: Type: CSI (a Container Storage Interface (CSI) volume source) Driver: ebs.csi.aws.com FSType: ext4 VolumeHandle: vol-xxx ReadOnly: false VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=xxx-8081-ebs.csi.aws.com ``` the (unmasked) volumeHandle does not even exist. Where might be the problem? As I said earlier, this issue popped up from day to day without changing the config K8S version 1.24 EBS CSI Driver addon version v1.11.5-eksbuild.2 (upgrade nor downgrade didn't help) Thanks
1
answers
0
votes
63
views
kovacs
asked 2 days ago
Suppose an 'EKS Cluster' was created, and if no loadbalancers exists, is there any way to associate the 'SSL Policies' without loadbalancer
1
answers
1
votes
9
views
asked 3 days ago
AWS documents 30 default managed node groups per eks cluster, however as its adjustable , need to know the maximum hard limit for the managed node groups per eks cluster without any performance issue ?
1
answers
0
votes
24
views
asked 3 days ago
Hi, I am using the **aws-ebs-csi-driver** add-on, and while I was able yesterday to input a custom JSON content, today I tried to upgrade the add-on's version to latest (*v1.15.0-eksbuild.1*) and I got the below error: `ConfigurationValue is not in valid JSON or YAML format.` Here is my JSON: ``` { "controller": { "nodeSelector": { "kubernetes.io/os": "linux", "aaaaa": "xxx-yyy-zzz", "some_other_key": "abcd" } } } ``` which seems valid, according to the schema I get from ``` aws eks describe-addon-configuration --addon-name aws-ebs-csi-driver --addon-version v1.15.0-eksbuild.1 ``` It's very strange that I was able to input that JSON yesterday, but I cannot now? Has the updated version broken something in the schema validator? Is this a bug or something wrong with the data I try to input?
2
answers
0
votes
63
views
babis
asked 4 days ago
We have deployed a Django application in EKS and used RDS PostgreSQL with RDS proxy as a database backend. Over the last month, we have started noticing occasional 500 "Internal Server Error" responses from our web app with the following error coming from Django: `django.db.utils.OperationalError: connection to server at "<proxy DNS name>" (<proxy IP address>), port 5432 failed: server closed the connection unexpectedly` This suggests that RDS proxy closed the client connection. In Django settings, the configured value of `CONN_MAX_AGE` parameter is the default 0, which means Django opens a new database connection for every query - this means that the observed failures cannot be related to RDS proxy's idle client connection timeout setting, which we have set to 30 minutes. To deal with this issue, we have implemented retries on the service mesh level (Istio). However, we would like to know more about the root cause of the failures and why we have seen an increased frequency of them during the last month - this almost never happened previously. Looking at the proxy and the database metrics in Cloudwatch, it doesn't look like there was increased traffic during the failures. Nevertheless, could the proxy close a client connection during a scaling operation? How can we get more insight into RDS Proxy internal operations? Turning on Enhanced Logging keeps it enabled only for 24 hours and there is no guarantee that the error will occur during that time window - we are also a bit nervous enabling it on production since it can slow down performance.
1
answers
0
votes
19
views
nikos64
asked 4 days ago
Team I Building a cloudformation stack in which we are creating AWS-EKS cluster and post creation of EKS cluster by using "Custom::Helm" resource type we are deploying Fluentbit in the cluster. ``` AWSTemplateFormatVersion: "2010-09-09" Parameters: pEKSClusterName: Description: Name of the EKS Cluster Type: String Default: EKSCluster VPCID: Description: VPC ID Type: AWS::EC2::VPC::Id AllowedPattern: ".+" Resources: fluentbitagent: Type: "AWSQS::Kubernetes::Helm" Properties: TimeOut: 10 ClusterID: !Ref pEKSClusterName Name: fluent-bit Namespace: aws-cloudwatch Repository: https://aws.github.io/eks-charts Chart: eks/aws-for-fluent-bit Values: image.repository: !FindInMap [RegionMap, !Ref "AWS::Region", cwrepo] ValueYaml: !Sub - | clusterName: ${ClusterName} serviceAccount: create: false name: aws-logs region: ${AWS::Region} vpcId: ${VPCID} - ClusterName: !Ref pEKSClusterName VPCID: !Ref VPCID Mappings: RegionMap: us-east-1: cwrepo: public.ecr.aws/aws-observability/aws-for-fluent-bit ``` I wanted to pass custom value to helm values for Fluentbit, for example i wanted to pass FluentBitHttpPort='2020', TIA:-)
2
answers
0
votes
14
views
AWS
asked 5 days ago
I have an angular and spring-boot application in the EKS cluster. My spring boot is connected to RDS in a private subnet in the same VPC as my cluster. I have created one alb ingress controller for my two deployment services. my frontend is in http://albdns/health and my backend is in http://albdns/user/app. How do I enable communication between the backend and frontend?
1
answers
0
votes
22
views
Joash
asked 7 days ago
I followed instruction https://aws.amazon.com/blogs/containers/introducing-amazon-cloudwatch-container-insights-for-amazon-eks-fargate-using-aws-distro-for-opentelemetry/ to deploy container insights to eks fargate. But there is nothing in cloudwatch->container insights dashboard. Is it supported in eks fargate? I also tried to deploy cloudwatch agent for prometheus in eks fargate by following instruction https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights-Prometheus-Setup.html. I still could not see anything in cloudwatch-> container insights dashboard. It says "You have not enabled insights on your containers"
1
answers
0
votes
23
views
Julie
asked 9 days ago
Hello! # Short summary of context and issue I am using EFS to mount a PV (ReadWriteMany [access-mode](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes)) via a PVC into EKS pods. The issue I'm having is that write updates propagate with big delays across pods: one pod may successfully write a file to the shared directory, but other pods see it some 10-60 seconds later (this delay varies across experiments seemingly at random). ## Experiment & Concrete results I run two simple pods. [Pod1](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/debugging_pods/pod1.yaml) runs first and continuously checks if `/workdir/share_point/example.txt` exists via the `stat` command. [Pod2](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/debugging_pods/pod2.yaml) runs second and writes the file, then does the same checks. As can be seen from the logs below, the file created at `16:52:36.544` is visible in Pod1 only at ~`16:52:57.694` Logs of Pod1: [pod1.log](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/logs/pod1.log) Logs of Pod2: [pod2.log](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/logs/pod2.log) ## Expected results I expected that Pod1 sees the file as soon as it is successfully written, as is the case for Pod2. As far as I understand, this would fit the [consistency](https://docs.aws.amazon.com/efs/latest/ug/how-it-works.html#consistency) model described in the docs. ## Worth Mentioning If I manually `kubectl exec` into the pods and attempt something similar, the problem seems to not be there, see [manual_test.log](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/logs/manual_test.log) 1. Pod2: `echo "Manual test" > /workdir/share_point/manual_test.txt` 2. Pod1: `date +\"%T.%3N\" && stat /workdir/share_point/manual_test.txt` ## Steps to Reproduce In what follows, I provide the simplest setup that reproduces the issue that I have. Following AWS docs, I set up a VPC, an EKS cluster and an EFS as a storage provider for the cluster. Each section below refers to the documentation I've followed and provides the commands used. ### VPC Follows [creating-a-vpc](https://docs.aws.amazon.com/eks/latest/userguide/creating-a-vpc.html). Creates a VPC from a template, will have 2 private and 2 public subnets with suitable configuration to host an EKS cluster. ```sh aws cloudformation create-stack --stack-name public-private-subnets \ --template-url https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/amazon-eks-vpc-private-subnets.yaml ``` ### EKS cluster Follows [create-cluster](https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html). I specify the cluster name, region, and for simplicity manually copy the subnet IDs of the above VPC. ```sh eksctl create cluster --name my-demo-cluster --region eu-central-1 \ --with-oidc --version 1.24 --node-ami-family Ubuntu2004 \ --vpc-private-subnets private_subnet1_id,private_subnet2_id \ --vpc-public-subnets public_subnet1_id,public_subnet2_id \ --node-private-networking --managed ``` ### EFS setup Follows the [efs-csi-page](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html) #### Create a Policy `curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/docs/iam-policy-example.json` ```sh aws iam create-policy \ --policy-name AmazonEKS_EFS_CSI_Driver_Policy \ --policy-document file://iam-policy-example.json ``` #### Create a ServiceAccount Replace account-id accordingly in the command below. ```sh eksctl create iamserviceaccount \ --cluster my-demo-cluster \ --namespace kube-system \ --name efs-csi-controller-sa \ --attach-policy-arn arn:aws:iam::account-id:policy/AmazonEKS_EFS_CSI_Driver_Policy \ --approve \ --region eu-central-1 ``` #### Install the EFS CSI Driver ```sh helm repo add aws-efs-csi-driver https://kubernetes-sigs.github.io/aws-efs-csi-driver/ helm repo update ``` ```sh helm upgrade -i aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver \ --namespace kube-system \ --set image.repository=602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/aws-efs-csi-driver \ --set controller.serviceAccount.create=false \ --set controller.serviceAccount.name=efs-csi-controller-sa ``` #### Creating the EFS, SG and mount points For simplicity, manually copy the subnet IDs of the VPC. `./complete_efs_setup.sh private_subnet1_id private_subnet2_id` ### Kubernetes StorageClass and PVC Replace the Filesystem ID in [kubernetes_storage/efs-storageclass.yaml](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/kubernetes_storage/efs-storageclass.yaml): `kubectl apply -f kubernetes_storage/efs-storageclass.yaml` `kubectl apply -f kubernetes_storage/efs-pvc.yaml` ### Deploy pods `kubectl apply -f debugging_pods/pod1.yaml` After the first one is running: `kubectl apply -f debugging_pods/pod2.yaml` ### Exec in pods `kubectl exec --stdin --tty pod1 -- /bin/bash` `kubectl exec --stdin --tty pod2 -- /bin/bash` ### Relevant system information Output of `aws --version` ``` aws-cli/2.8.7 Python/3.9.11 Linux/5.15.0-58-generic exe/x86_64.ubuntu.22 prompt/off ``` Output of `eksctl version` ``` 0.125.0 ``` Output of `helm version` ``` version.BuildInfo{Version:"v3.10.3", GitCommit:"835b7334cfe2e5e27870ab3ed4135f136eecc704", GitTreeState:"clean", GoVersion:"go1.18.9"} ``` ### Thank you I'd be very thankful for any hint/pointer as to where the issue may lie. Thank you in advance.
0
answers
1
votes
20
views
Ronald
asked 10 days ago
I installed AWS Load Balancer Controller through Helm. The ingress is created but the ALB is not and I am getting an error. I followed the guide below. -> https://docs.aws.amazon.com/ko_kr/eks/latest/userguide/aws-load-balancer-controller.html * Deployment / Service - logs ERROR {"level":"error","ts":1674024616.2905765,"logger":"controller.ingress","msg":"Reconciler error","name":...,"namespace":...,"error":"UnauthorizedOperation: You are not authorized to perform this operation.\n\tstatus code: 403} * ingress ERROR Warning FailedBuildModel 19s ingress Failed build model due to UnauthorizedOperation: You are not authorized to perform this operation. status code: 403
4
answers
0
votes
55
views
ari
asked 10 days ago