Intermittent 503 on API Gateway

0

Two days ago, I started getting 503 "Service Unavailable" responses from my API Gatewate/Lambda calls. This is the first time I've experienced this. It happens about 30-40% of the time right now. It is NOT just one endpoint. It happens on all of them randomly. Even my simple, next-to-zero resources "hello" endpoint gets the intermittent 503 "Service Unavailable". So it is NOT that I am using too many resources. I am the only one using the API right now and I'm testing it once every 3 seconds. So there is NO WAY I am overloading it beyond its capacity.

I did NOT change any code before this started happening. It just started happening two days ago and has continued until now. According to the AWS Health Dashboard, they are reporting no problems. But I AM having a problem and it is critically affecting my ability to do my job.

I check the logs, but they don't tell me anything I don't already know.

We are paying for "Developer Support" and submitted a ticket 24 hours ago, but I've heard nothing back. The SLA is supposed to be 12 hours.

WHAT IS GOING ON?!?

2 Respostas
0
Resposta aceita

I finally figured out what was happening!!!

I inadvertently created an infinite loop which was taking up so many resources that it interfered with the API Gateway's ability to process all requests.

How did I create an infinite loop? I had a lambda that was triggered by an ObjectCreated event in an S3 bucket, the big problem was that the lambda wrote a new file to that same bucket which then started the process over again. Infinitely. (I fixed it with one line of code. :))

So if anyone else is having a problem with intermittent 503's on the API Gateway, maybe check if any of your backend processes might be creating an infinite loop and thus taking away processing power from the API Gateway. I'm a little embarrassed, of course, but if it helps someone else in the future, it will be worth it.

respondido há 7 meses
profile picture
ESPECIALISTA
avaliado há 2 meses
0

Are there logs in Cloudtrail? Did you enable execution and access logs for API GW/stage?

profile picture
ESPECIALISTA
respondido há 7 meses
profile picture
ESPECIALISTA
avaliado há 2 meses
  • Thank you for responding.

    I ended up reverting back 3 versions of my backend code and then redeploying it one at a time looking for any indications of the problem. I got the code all back to the present version and I do not see the problem anymore. I will keep monitoring it. But it looks fixed...

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas