Skip to content

CloudFront Returning 504 Gateway Timeout Without Reaching Origin (OriginCommError)

0

We are experiencing intermittent 504 Gateway Timeout errors from CloudFront when calling our API. The requests do not appear in our API Gateway (Kong) logs, suggesting they never reached the origin. Due to the client’s retry mechanism, the request succeeds after a 5-second retry.

Client → CloudFront → NLB → EKS (Kong API Gateway → Backend service)

  • The origin is an NLB with Kong (Kong and backend service are deployed in EKS) running behind it.
  • CloudFront cache is disabled for all behaviour.
  • Request failed at 06:12:36 with 504, and the same request succeeded at 06:12:41 on retry.
1 Answer
0

When CloudFront returns a 504 Gateway Timeout error without the request reaching your origin, this typically indicates a communication problem between CloudFront and your origin server. The "originCommError" you're experiencing suggests that CloudFront attempted to establish a connection with your origin but either the attempt failed or the connection was closed prematurely.

Here are several potential causes and solutions for this issue:

  1. Network Configuration Issues:
  • Check if your NLB is properly accessible from the internet. If your NLB is in a subnet with a NAT Gateway instead of an Internet Gateway, CloudFront may not be able to establish a connection.
  • Ensure your VPC subnets are correctly configured. If your load balancer spans multiple subnets, make sure all of them have proper internet connectivity.
  1. Security Group and Firewall Settings:
  • Verify that security groups allow inbound traffic from CloudFront IP ranges on the appropriate ports.
  • Check if any firewall rules might be blocking connections from certain CloudFront edge locations.
  1. Origin Response Timeout:
  • CloudFront has a default timeout of around 60-90 seconds for requests to origins. If your origin takes longer to respond, CloudFront will terminate the connection.
  • You can request an increase to this timeout via AWS Support if needed.
  1. Resource Constraints:
  • High CPU or memory usage on your EKS nodes could cause intermittent failures. Check if your Kong or backend services are experiencing resource constraints.
  • Monitor your EKS cluster metrics during these failures to identify potential bottlenecks.
  1. Connection Handling:
  • Ensure your Kong API Gateway is properly configured to handle the volume of incoming connections.
  • Check for any connection pooling issues or TCP keepalive settings that might be affecting connection stability.

Since the retry succeeds after 5 seconds, this suggests a temporary issue rather than a permanent configuration problem. This could be related to resource constraints, connection limits, or transient network issues.

To troubleshoot further:

  • Enable detailed CloudFront logging to capture more information about the failed requests
  • Monitor your NLB connection metrics during the failures
  • Check if the failures correlate with specific CloudFront edge locations
  • Consider implementing a more robust retry mechanism in your client applications as a temporary workaround

The fact that retries succeed suggests that your architecture is fundamentally sound, but there may be intermittent issues affecting connection reliability.
Sources
cloudfront cache behavior causes 504? | AWS re:Post
HTTP 504 status code (Gateway Timeout) - Amazon CloudFront
Cloudfront 504 gateway Timeout | AWS re:Post
Cloudfront timeout to Application Load Balancer | AWS re:Post

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.