Skip to content

Payment gateway notify requests fail on AWS EC2 (TLS handshake issue)

1

Hello,

I’m facing an issue with payment gateway (PGW) notify (callback) requests on an AWS EC2 instance and cannot pinpoint where the problem is.

Environment

The same backend application runs in three scenarios:

Locally via ngrok

  • Everything works.
  • The notify request reaches the application, appears in the logs, and is processed correctly.

Hetzner VPS

  • The gateway connects, the TLS handshake completes, but no HTTP request follows.
  • Nginx does not forward anything to the backend, as if the connection stops immediately after TLS.
  • The access log shows no POST /payments/notify.

AWS EC2

  • The gateway cannot complete the TLS handshake.
  • In tcpdump I see the TCP connection opening (SYN, SYN/ACK, ACK) and then ClientHello, but the connection terminates there—no ServerHello or further handshake messages.

What I have verified

  • Security groups & firewall: open to 0.0.0.0/0 on port 443 (no filtering).

  • Nginx: listening on 0.0.0.0:443.

  • Manual tests (all succeed):

    • curl -vk https://<domain>/payments/notify -X POST -d 'ping=1'
    • openssl s_client -connect <domain>:443 -servername <domain>
  • Browsers and standard API clients connect and send requests without issues.

  • The issue occurs only when the notify request is sent by the payment gateway.


Results per environment

Hetzner VPS

  • TLS handshake completes successfully (ClientHello → ServerHello → keys exchanged).
  • No HTTP request arrives afterwards (no POST /payments/notify in access logs).
  • Connection closes without any HTTP payload following TLS.

AWS EC2

  • TLS handshake fails right after ClientHello (no ServerHello observed).
  • No entries in Nginx access/error logs (application layer is never reached).

Ngrok (local)

  • TLS handshake completes.
  • HTTP POST /payments/notify arrives with headers and body.
  • The application processes the notify successfully.

The confusing part

  • On Hetzner, TLS is negotiated, but the HTTP request never arrives.
  • On AWS EC2, the TLS handshake fails immediately after ClientHello.
  • Via ngrok, everything works end-to-end.

Question

Has anyone encountered a situation where a payment gateway client fails to complete TLS handshake against an AWS EC2 endpoint using Let’s Encrypt certificates, even though:

  • curl, openssl s_client, and browsers all work fine,
  • the same application and certificate setup works on Hetzner VPS, and
  • the application processes requests correctly when they arrive (as seen via ngrok)?

Why might the gateway:

  • complete TLS on Hetzner but never send the HTTP request, and
  • fail the TLS handshake immediately after ClientHello on AWS EC2?

Any insights would be greatly appreciated.

2 Answers
0

This issue appears to be related to SSL/TLS negotiation failures specific to the payment gateway's client when connecting to your AWS EC2 instance. Based on your detailed observations, here are some potential causes and solutions:

  1. TLS Version or Cipher Suite Mismatch: The payment gateway might be using specific TLS versions or cipher suites that are supported by ngrok but not by your EC2 or Hetzner configurations. The fact that the handshake fails immediately after ClientHello on EC2 suggests a fundamental incompatibility in the TLS parameters being negotiated.

  2. Certificate Chain Issues: Let's Encrypt certificates require the full certificate chain to be properly configured. If the intermediate certificates are missing or incorrectly ordered in your EC2 setup, some clients (particularly older or more strict ones) might fail the handshake while modern browsers and tools like curl might still work due to their more robust certificate handling.

  3. SNI (Server Name Indication) Requirements: The payment gateway might have specific SNI requirements that aren't being met in your EC2 configuration.

  4. Network Path Differences: There could be intermediary network devices or configurations specific to AWS that are interfering with the TLS handshake for the payment gateway's client.

Troubleshooting steps:

  1. Compare TLS Configurations: Use openssl s_client with the -showcerts flag to compare the certificate chains presented by your EC2, Hetzner, and ngrok endpoints.

  2. Check Security Policies: Verify if your EC2 instance or any associated load balancers have security policies that might restrict certain TLS versions or cipher suites.

  3. Enable Detailed TLS Logging: Configure Nginx with more verbose SSL logging to capture the exact point of failure during the handshake.

  4. Packet Capture Analysis: Perform a more detailed packet capture analysis to see exactly what cipher suites and TLS versions the payment gateway is offering in its ClientHello message.

  5. Contact the Payment Gateway: Since the issue is specific to their client, they might have encountered similar issues with other merchants on AWS EC2. They may have specific requirements for TLS configurations that aren't documented publicly.

  6. Test with Different SSL Implementations: If possible, try a different SSL implementation (e.g., switch from Nginx to Apache or vice versa) to see if the issue persists.

The fact that the behavior differs between Hetzner (completes TLS but no HTTP request) and EC2 (fails during TLS handshake) suggests there might be multiple issues at play. The payment gateway client might be very sensitive to specific TLS parameters or certificate configurations that differ between your environments.
Sources
ALB Node SSL Connection Failure - One of Two Nodes Returns SSL_ERROR_SYSCALL in eu-central-1 | AWS re:Post
ALB Connections Accepted but No Requests Processed – HTTPS Hangs (RequestCount = 0) | AWS re:Post
Troubleshooting ClientTLSNegotiationErrorCount in AWS Network Load Balancer | AWS re:Post

answered 4 months ago
0

Hey Boris,

Have you done captures in the destination instance, like the EC2, to check if the ClientHello reaches the EC2 and if the ServerHello is sent back by the EC2?

sudo tcpdump -nni enX0 -p tcp port 443 -w mycapture.pcap

You can use tcpdump and save this in a PCAP to investigate in a packet capture tool, like wireshark.

Have you certifies that the Path MTU discovery for the 3 Environments is the same?

You can check it using the following command:

 tracepath API-URL 

After this verification you can check if the MTU of the packet is the root cause of your problem.

Enter image description here

Can you check if the the gateway sets the DF bit to 1? That could avoid the packets to be fragmented been discarded by any hop that have a smaller MTU.

Can you change the MTU of the gateway's interface to a smaller value? like 1400 bytes? That cold help to guarantee that no internet hop, usually 1500 MTU, will block this packet after the addition of TCP and TLS headers, and even TLS certificate transmissions?

Enter image description here

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-mtu.html

AWS
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.