Possibly related: https://repost.aws/questions/QUcNiaV2eCSm2_eWZgajO9Ig/timeouts-on-reverse-proxy-after-enabling-dns-hostnames
We have a typical VPC configured with a public subnet running an nginx reverse proxy on 10.0.0.5, which also has a public IP, and private subnet hosting our application, which is serving a web frontend on 10.0.1.123 connected to a backend, also on the private network. We are having difficulty accessing the application from the internet, with users experiencing random timeouts and gateway 503 timeout errors. Restarting the backend usually resolves the issue, but after a few hours the issue appears again. We didn't find anything indicating errors in application logs, and all processes seem to be responding normally. Accessing the instances via SSH sometimes also experiences similar delays, but retrying usually resolves the issue.
While trying to debug this, I have enabled VPC Flow Logs to try to see where the connectivity is failing. I noticed normal traffic between the reverse proxy and the app frontend on port 3050, with the app responding normally to the ephemeral port that opened the connection. But sometimes I see REJECT
messages in the flow logs, always in the direction of the frontend to the proxy, and always a single 40 byte package. There are no other significant REJECT
messages I can find on other network interfaces in the VPC.
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.1.123 10.0.0.5 3050 57408 6 7 811 1707817290 1707817348 ACCEPT OK
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.0.5 10.0.1.123 57384 3050 6 9 1515 1707817290 1707817348 ACCEPT OK # normal communication
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.1.123 10.0.0.5 3050 57384 6 1 40 1707817290 1707817348 REJECT OK # why is this blocked? there is a response on this port 2 lines down
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.0.5 10.0.1.123 57408 3050 6 9 1519 1707817290 1707817348 ACCEPT OK
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.1.123 10.0.0.5 3050 57384 6 7 811 1707817290 1707817348 ACCEPT OK # normal communication
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.0.5 10.0.1.123 57416 3050 6 10 1507 1707817290 1707817348 ACCEPT OK # normal communication
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.1.123 10.0.0.5 3050 57416 6 9 892 1707817290 1707817348 ACCEPT OK # normal communication
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.1.123 10.0.0.5 3050 57446 6 7 724 1707817290 1707817348 ACCEPT OK
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.1.123 10.0.0.5 3050 57446 6 1 40 1707817290 1707817348 REJECT OK # why is this blocked? the previous 7 packets went through fine
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.1.123 10.0.0.5 3050 57408 6 1 40 1707817290 1707817348 REJECT OK
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.1.123 10.0.0.5 3050 55158 6 1 40 1707817350 1707817408 REJECT OK
2 68xxxxxxxx00 eni-04xxxxxxxxxxxxxc2 10.0.1.123 10.0.0.5 3050 55158 6 7 810 1707817350 1707817408 ACCEPT OK # why is this accepted? the previous packet was blocked
What could these 40 byte packages be, is it some sort of TCP/HTTP keep-alive traffic? Or something related to DNS? It's not ICMP because the protocol field is 6
. Why are they being rejected? I also noticed the difference between start and end times is almost always 58 or 60 seconds, what could this indicate? nginx keepalive_timeout is configured to 65 seconds, and the operating system TCP keepalive settings are as follows:
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
Thanks for any help!