- Newest
- Most votes
- Most comments
You're correct that the two tunnels terminate on different AWS endpoint IPs. This is the key diagnostic clue: if both tunnels fail simultaneously despite terminating on separate AWS infrastructure, the single common point of failure is the Customer Gateway (CGW) side.
Both tunnels originate from the same CGW IP (your Cisco ASA's public interface). Anything that disrupts connectivity from that single IP to the internet will take down both tunnels at the same time, even though the AWS endpoints are independent.
Most likely causes:
-
The CGW is failing to respond to AWS DPD (Dead Peer Detection) messages on both tunnels. AWS sends DPD probes every 10 seconds. After 3 consecutive missed responses, AWS declares the peer dead and takes action based on the configured DPD timeout action. If your ASA experiences a brief CPU spike, memory pressure, or interface flap, it may miss DPD responses on both tunnels simultaneously.
-
An upstream network issue between your CGW and the internet (ISP micro-outage, NAT device session table exhaustion, or firewall rate-limiting UDP 500/4500) that is not severe enough to appear on ISP status pages but enough to drop the DPD packets.
The 25-45 minute recovery time suggests your DPD timeout action is set to "Clear" (the default) and the startup action is set to "Add" (the default, meaning AWS acts as responder only). With this combination, when a DPD timeout occurs, AWS ends the IKE session and does not attempt to re-initiate it. The tunnel stays down until the CGW initiates a new IKE session. If your ASA has a backoff timer or does not actively retry, the tunnel remains down until the ASA's retry timer fires.
Troubleshooting steps:
-
Enable VPN tunnel logging on the AWS side and check whether the tunnels went down due to DPD timeout (CGW not responding) or due to the CGW sending a DELETE notification (CGW intentionally tearing down). This distinction tells you whether the CGW went unresponsive or actively disconnected.
-
Check your ASA logs during the exact outage timestamps for CPU/memory alerts, IKE errors, or interface flaps:
show cpu usage
show memory
show crypto ikev2 sa detail
show logging | include VPN|IKE|DPD
-
Run continuous pings from a device outside your CGW (e.g., a separate host on your network) to both AWS tunnel endpoint IPs. If pings fail during the outage, the issue is upstream of the CGW (ISP/firewall/NAT), not the CGW itself.
-
Check if your firewall or NAT device has aggressive UDP session timeouts. Some devices timeout UDP 4500 (NAT-T) sessions as low as 30 seconds, which can cause both tunnels to appear unreachable from AWS's perspective.
Recommended changes:
-
Change the DPD timeout action from "Clear" to "Restart". With "Restart", AWS will actively attempt to re-establish the IKE session after a DPD timeout rather than waiting for the CGW to initiate. This should significantly reduce your recovery time.
-
Change the IKE startup action from "Add" (responder) to "Start" (initiator). This ensures AWS actively tries to bring the tunnel up rather than waiting for the CGW.
-
On the ASA side, configure aggressive IKE retry timers so the CGW attempts to re-establish immediately after a failure rather than waiting 25-45 minutes.
-
For long-term redundancy, consider adding a second VPN connection from a different CGW device (different public IP, ideally different ISP). This eliminates the single point of failure at the CGW level.
References:
AWS tunnels terminate in different Availability Zones. A simultaneous drop almost always points to the single common denominator on your end - your Cisco ASA or the ISP.
Since both tunnels drop together for 25-45 minutes, check your ASA syslogs just before the outage for these three common culprits:
-
SA Rekeying Failures: The specific outage duration strongly suggests an IPsec lifetime mismatch. Ensure your ASA perfectly matches AWS defaults: Phase 1 (8 hours / 28800s) and Phase 2 (1 hour / 3600s). Look for IKEv2 rekeying errors.
-
Resource Exhaustion: A CPU spike, memory leak, or crypto-engine hang on the ASA will kill both tunnels at once. Try to run
show crypto accelerator statisticsduring the next outage. -
Silent ISP Packet Loss: ISPs rarely detect transient UDP (500/4500) drops or upstream BGP flaps. Set up continuous pings to the AWS tunnel public IPs to catch micro-outages that trigger DPD (Dead Peer Detection) disconnects.
The exact root cause will be in your ASA's IKE/IPsec logs in the 5 minutes leading up to the disconnect.
P.S.: I normally download the official Cisco ASA configuration template directly from the AWS VPC console (selecting ASA 9.x+ and IKEv2). Cross-check your running configuration line-by-line with this fresh template - especially the crypto ikev2 policy, transform-set, and lifetimes. If there is even a minor discrepancy or an outdated proposal from an older deployment, it often triggers exactly this kind of rekeying instability.
Relevant content
- asked 2 years ago
- asked 3 years ago
- AWS OFFICIALUpdated 3 years ago

Ah, nice pointing out the Startup and DPD actions on the AWS side. Combined with checking the ASA crypto logs, that pretty much covers it.