VPN CHILD_SA issue while VPN Tunnel UP

0

Hello AWS people We connected to the source DB (on-premises) via DMS. I configured the VPN before the connection and made sure it was working fine. However, a few days later, the DMS connection was not possible. The following logs appeared.

Test Endpoint failed: Application-Status: 1020912, Application-Message: Failed to connect Network error has occurred, Application-Detailed-Message: RetCode: SQL_ERROR SqlState: HYT00 NativeError: 0 Message: [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired
The VPN tunnel was up.

Communication from AWS VPC to on-premises such as telnet, ping, and sqlcmd was not possible. However, on-premises -> AWS was capable of telnet, ping.

We turned on Log activation among VPN options to look at the VPN log. When the VPN was being <modifying>, it suddenly got in touch! Then, when it became <available>, the communication was cut off.

I thought it was weird and we tried to fix it again. This time, communication is working even if it's <available>. We are experiencing an issue with our AWS Site-to-Site VPN where the CHILD_SA (Security Association) is repeatedly deleted and re-established. While the tunnel status generally shows as "UP," and IKE Phase 1 and Phase 2 SA are successfully established, the Phase 2 CHILD_SA keeps getting deleted and recreated, impacting the stability of the VPN connection and disrupting data transmission. { "event_timestamp": 1733206653, "details": "AWS tunnel Phase 2 CHILD_SA is closed", "dpd_enabled": true, "ike_phase1_state": "established", "ike_phase2_state": "established" }, { "event_timestamp": 1733206653, "details": "AWS tunnel is sending DELETE for Phase 2 SA with SPI: 0xcb179068", "dpd_enabled": true, "ike_phase1_state": "established", "ike_phase2_state": "established" }

What bug is this? What did I set up wrong? All options such as DPD and Lifetime in VPN are set to default. On-premises VPN equipment is Fortigate.

2 Answers
1

The issue you're experiencing with your AWS Site-to-Site VPN appears to be related to the CHILD_SA (Security Association) being repeatedly deleted and re-established. This can cause instability in the VPN connection and disrupt data transmission, even when the tunnel status shows as "UP" and both IKE Phase 1 and Phase 2 SAs are established.

There are a few potential causes for this issue:

  1. Asymmetric routing: If your customer gateway device (Fortigate in this case) doesn't support asymmetric routing, it can lead to instability in the VPN connection. Verify that your Fortigate device is configured to support asymmetric routing.

  2. Mismatched traffic selectors or encryption domains: Ensure that the traffic selectors (also known as encryption domains or interesting traffic) are correctly configured and match on both the AWS side and your on-premises Fortigate device. Mismatched configurations can cause the CHILD_SA to be repeatedly deleted and re-established.

  3. Security Association (SA) limits: AWS limits the number of security associations to a single pair for inbound and outbound SAs. If your VPN connection requires multiple networks, you may need to summarize the local and remote CIDR on the VPN connections to use one SA.

  4. Incorrect proposal settings: The CHILD_SA may fail to establish due to negotiation errors, such as incorrect proposal settings. Verify that the Phase 2 (IPsec) settings on your Fortigate device match the AWS VPN requirements.

  5. Routing issues: Ensure that your route tables in both AWS and on-premises are correctly configured to route traffic through the VPN connection.

To troubleshoot and resolve this issue:

  1. Review your Fortigate configuration to ensure it supports asymmetric routing and that the traffic selectors match the AWS VPN configuration.

  2. Check that the Phase 2 (IPsec) settings on your Fortigate device are compatible with AWS VPN requirements.

  3. Verify that your route tables in both AWS and on-premises are correctly configured.

  4. If you have multiple networks that need to communicate over the VPN, consider summarizing the CIDRs to reduce the number of required SAs.

  5. Monitor the VPN logs closely after making any changes to identify if the CHILD_SA deletion issue persists.

If the issue continues after verifying these settings, you may need to engage AWS support for further assistance in diagnosing the problem.
Sources
Troubleshoot VPN routing issues | AWS re:Post
AWS Site-to-Site VPN logs - AWS Site-to-Site VPN

profile picture
answered a month ago
profile picture
EXPERT
reviewed a month ago
0

To troubleshoot IPsec VPN connections, it is most helpful to study the logs from both peers, and to start from the point where the original SA pair has been created. If it is possible to have packet capture from the wire, it can also help. The log excerpt above is so short that from it is only possible to see that the peer has decided that the Child SA (IPsec SA) needs to be deleted, and it sends DELETE notification to the other peer. The reason for the deletion is not visible from the excerpt. It could be due to e.g. expiration of lifetime, or change of configuration, or many other reasons.

When you look at the log messages from the setting up of the initial SAs, you may notice if there has been something that was not quite what you expected in your configuration. E.g. the traffic selectors may not be fully symmetrical at both peers, and especially in such situations things may work differently depending on which peer initiates the connection and which one is the responder. You may get a fully working SA when one end is initiating but if the rekey is done in the other direction, it does not quite work the same.

AWS
answered 13 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions