EKS pod sometimes success connect to RDS cluster, but sometimes failed. How can I fix it?

0

Hi, I register an issue to https://github.com/aws/amazon-vpc-cni-k8s/issues/2046 . So I wrote this support case.

(the issue text: What happened:

eks cluster and rds (mysql) cluster are in same vpc. I added my eks security group(eks-cluster-sg-MYCLUSTERNAME-*) to rds security group's inbound rule (port 3306) I found rds connection sometimes succeed but sometimes failed. (it is timeout error) I set timeout seconds to 300s, so I think it's not matter. It has some weird pattern. When connection success, it tooks < 1 sec or it tooks more than 2 minutes. (very fast or long) I tested in local, there is no problem (connection tooks < 1sec). but same code in pod shows above things.

I don't know why connection sometimes success, sometimes failed. how to fix it? any ideas? thank you.

Environment:

Kubernetes version (use kubectl version): 1.22.0 CNI Version )

My problem happens when my EKS pods trying connect to RDS cluster. I think it is EKS network problem. because it happens only in EKS pod. Local connection test (my pc to RDS DB) always success. And our service using RDS didn't have any issue.

Can I solve this? thank you.

(I tried create VPC flow logs, but cloudwatch log group store nothing :( )

1개 답변
0

How frequently does it fail? Does it fail from every POD or from a particular POD? If it is from a particular POD, I would take a deeper look at the code running in the POD and the configuration as well as the Kubernetes network policies for that particular POD.

One thing I will recommend strongly is to implement retry in your connection? If the average connection time is 1-2 seconds, put a 5 sec timeout and implement retries in your code.

If this is happening regularly, then it could be a more serious issue, that needs to be looked at by AWS support.

profile pictureAWS
전문가
답변함 2년 전
  • I think every pods have same problem. I can't resolve this so I made retry policy until pod(job) sucess. (30s timeout, 5 retries)

    This situation happens every day( because pods are made from cronjobs(0 0 * * *) everyday). but luckily somedays has no connection fail.

  • This does not sound right to me. I can understand occasional failure under load but regular failure should not happen. Please file a support ticket for them to take a look and investigate what could be causing the failures.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠