I’ve just learned the hard way how a single terraform apply -auto-approve can nuke a live database when the configuration is booby-trapped.
What I ran
terraform apply \
-var-file="workspace-configs/production.tfvars.json" \
-auto-approve
production.tfvars.json:
{
"ecs_desired_api_instances_count": 1,
"ecs_api_cpu": 512,
"ecs_api_memory": 1024,
"ecs_api_capacity_provider": "FARGATE_SPOT",
"ecr_backend_api_max_images_count": 10,
"mysql_max_connections": 500,
"mysql_instance_class": "db.t3.small",
"sns_notification_admin_emails": [...]
}
What happened
- The plan showed “2 to destroy”, including
module.mysql.aws_db_instance.mysql.
- Because the previous dev never declared the variables above, Terraform spat out the usual “Value for undeclared variable” warnings.
- Worse, there were no
lifecycle { prevent_destroy = true } guards anywhere, and every workspace (dev, staging, prod) shares the same S3 backend key, so the state file treated prod as a fresh install.
- With
-auto-approve in place the apply went straight ahead, deleted the existing db.t3.small MySQL instance, then failed trying to recreate it (control-character error in the password) – effectively wiping prod.
- S3 bucket deletions also failed because they weren’t empty, but the damage to RDS was done.
What I’ve checked so far
- Automated backups: apparently retention was set to 0 days, so all automated backups vanished the moment the instance was deleted.
- Point-in-time restore: impossible without automated backups.
- Manual snapshots: the only one we have is from 6 March 2025 – that snapshot restored fine, but leaves four months of missing data.
- AWS Support: I’m talking directly with personal contacts, but no ticket yet.
The “trap” left in Terraform
- Undeclared variables in the workspace file triggered only warnings, not errors, so the apply continued.
- Single remote state key (
devops-rha-tf-state/terraform.tfstate) means each workspace can clobber the others.
- No
prevent_destroy on RDS, S3, etc. – nothing blocked the destroy plan.
-auto-approve was baked into every deploy script.
- Skip-final-snapshot is hard-coded for RDS destroys; deleting the instance removes all automated backups immediately.
Question to the community
Aside from the solitary March manual snapshot, is there any AWS-side magic that can resurrect data from a just-deleted RDS instance when:
- automated backups were disabled, and
- no final snapshot was taken (because
skip-final-snapshot was true)?
I’m aware of the grim answers on older threads, but I’d be grateful for confirmation from anyone who has pulled off a successful recovery via Support in similar conditions.
What I’ll fix whatever the outcome
- Split state per environment, enable versioning on the state bucket.
- Add
prevent_destroy to all prod resources.
- Re-enable automated backups with ≥7-day retention, daily manual snapshot.
- Remove
-auto-approve from all CI scripts.
Any insight or last-ditch ideas are hugely appreciated – thanks in advance.