Tracing API Gateway 5xx errors

0

We have an API Gateway REST API (very standard, no auth, nothing interesting) which calls a bunch of Lambdas

Since yesterday ~12:00 GMT, we've been receiving a large number of 5xx errors, triggering one of our slack notifications. None of our lambdas have any errors (searched via metric filter and looking at the Cloudwatch metrics for lambdas), and the API GW execution logs are also clean. As far as we can see, all of our services are functioning normally, our monitoring services are all green, and we've had no client reports of outages/degraded services. There have been no deployments since last week, and there's no change in any other metric across the system. API GW only tells us that this is 5xx and not something more specific (such as a timeout).

How can we debug this? Ideally we'd like to see what the exact status/message is from these errors and where they are coming from. It could easily be that someone is hitting our API with large payloads or bad data, but we can't tell as it's not even getting to our lambdas.

preguntada hace 7 meses300 visualizaciones
1 Respuesta
1
Respuesta aceptada

To trace and debug 5xx errors in your API Gateway when Lambdas show no errors and logs are clean, follow these steps:

  • Enable access logging in API Gateway for detailed request/response logs.
  • Inspect CloudWatch access logs for patterns or specific requests causing 5xx errors.
  • Turn on AWS X-Ray for API Gateway and Lambda for end-to-end request tracing.
  • Analyze CloudWatch metrics for API Gateway, focusing on 5xx and backend connection errors.
  • Confirm no changes in API configuration that might affect integration responses or timeouts.
  • Check for requests with large payloads that could exceed size limits for API Gateway and Lambda.
profile picture
EXPERTO
respondido hace 7 meses
profile picture
EXPERTO
revisado hace 7 meses
  • Thanks, this helped me find the source. Specifically, enabling the access logs and analyzing the IP addresses in question. Turns out we were under a distributed attack for exactly 24 hours.

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas