Skip to content

Accidentally destroyed prod RDS via mis-configured Terraform – any chance of recovery?

0

I’ve just learned the hard way how a single terraform apply -auto-approve can nuke a live database when the configuration is booby-trapped.

What I ran

terraform apply \
  -var-file="workspace-configs/production.tfvars.json" \
  -auto-approve

production.tfvars.json:

{
  "ecs_desired_api_instances_count": 1,
  "ecs_api_cpu": 512,
  "ecs_api_memory": 1024,
  "ecs_api_capacity_provider": "FARGATE_SPOT",
  "ecr_backend_api_max_images_count": 10,
  "mysql_max_connections": 500,
  "mysql_instance_class": "db.t3.small",
  "sns_notification_admin_emails": [...]
}

What happened

  • The plan showed “2 to destroy”, including module.mysql.aws_db_instance.mysql.
  • Because the previous dev never declared the variables above, Terraform spat out the usual “Value for undeclared variable” warnings.
  • Worse, there were no lifecycle { prevent_destroy = true } guards anywhere, and every workspace (dev, staging, prod) shares the same S3 backend key, so the state file treated prod as a fresh install.
  • With -auto-approve in place the apply went straight ahead, deleted the existing db.t3.small MySQL instance, then failed trying to recreate it (control-character error in the password) – effectively wiping prod.
  • S3 bucket deletions also failed because they weren’t empty, but the damage to RDS was done.

What I’ve checked so far

  • Automated backups: apparently retention was set to 0 days, so all automated backups vanished the moment the instance was deleted.
  • Point-in-time restore: impossible without automated backups.
  • Manual snapshots: the only one we have is from 6 March 2025 – that snapshot restored fine, but leaves four months of missing data.
  • AWS Support: I’m talking directly with personal contacts, but no ticket yet.

The “trap” left in Terraform

  1. Undeclared variables in the workspace file triggered only warnings, not errors, so the apply continued.
  2. Single remote state key (devops-rha-tf-state/terraform.tfstate) means each workspace can clobber the others.
  3. No prevent_destroy on RDS, S3, etc. – nothing blocked the destroy plan.
  4. -auto-approve was baked into every deploy script.
  5. Skip-final-snapshot is hard-coded for RDS destroys; deleting the instance removes all automated backups immediately.

Question to the community

Aside from the solitary March manual snapshot, is there any AWS-side magic that can resurrect data from a just-deleted RDS instance when:

  • automated backups were disabled, and
  • no final snapshot was taken (because skip-final-snapshot was true)?

I’m aware of the grim answers on older threads, but I’d be grateful for confirmation from anyone who has pulled off a successful recovery via Support in similar conditions.


What I’ll fix whatever the outcome

  • Split state per environment, enable versioning on the state bucket.
  • Add prevent_destroy to all prod resources.
  • Re-enable automated backups with ≥7-day retention, daily manual snapshot.
  • Remove -auto-approve from all CI scripts.

Any insight or last-ditch ideas are hugely appreciated – thanks in advance.

2 Answers
0

If you have backups / snapshots or point-in-time-recovery, you can recreate the database from the backup. Without the backups, it is not possible to recover the database.

To avoid accidentally deleting critical resources, you can enable deletion protection for the resources in the future.

EXPERT
answered 8 months ago
EXPERT
reviewed 8 months ago
0

If your Amazon RDS instance was deleted with automated backups disabled and no final snapshot taken (i.e., skip-final-snapshot was set to true), AWS does not retain any automated backups or point-in-time restore options for that instance. In this scenario, AWS Support cannot recover the data, as there are no hidden or undeclared “backups” stored by RDS. The only viable restore is from any existing manual snapshots, such as your March 2025 snapshot. Unfortunately, if no manual snapshot exists covering the needed period, there’s no AWS-side way to resurrect lost data. The best course of action moving forward is to implement robust backup and snapshot policies, enable lifecycle protections like prevent_destroy, separate Terraform state files per environment, and avoid using -auto-approve in sensitive environments. For official documentation, see AWS’s guidance on RDS backups and data recovery (AWS RDS Backup and Restore).

answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.