Validation timeout when using Restore Testing

0

I've configured my vault with a Restore Testing plan. I can see that it is restoring the resources, but when I check the validation status of the jobs it says "Timed out". The docs do not discuss what this status means, and I don't see any other messages that might indicate what happened. I don't know where the logs files live either. There doesn't seem to be anything in Cloudwatch, but I see no configuration in AWS Backup for controlling where it might go either.

I've also setup a report and a framework under Backup Audit Manager group. The report runs, but when I look at the report it's empty. Here is the content:

{
  "reportItems": []
}

In the framework table I've added my own framework config. The framework results are Framework Status = inactive , Deployment Status = Failed. The error message says "Failed to load compliance details. Refresh the page or try again later."

I think because I have an Aurora Cluster the restore isn't fully deploying because you have to manually add back the reader and writer endpoints after the cluster is restored. However, AWS Backup doesn't reflect that is the exact problem or how to fix it if that is the case. The docs DO explain you must do that after restore. But with manual steps like this I don't understand how Restore Testing could possibly work since it requires 100% automation for deployment.

So why isn't this working? And if I'm right how does AWS Backup Engineers imaging this scenario working?

asked 3 months ago337 views
1 Answer
1
Accepted Answer

Restore testing job "Status" : This status is automatically updated and tells about the restore job was successful or failed.

Restore testing job "Validation status" : The validation status is used for the customer to validate whether the restore is working, for example the restore testing job that you have ran is corrupted and you can't use that recovery point for restore purposes you can update the validation status of that restore testing job as "FAILED" with additional "--validation-status-message" .

The time out happens if no **PutRestoreValidationResult** API call was made to update the validation status from the customer end in the period that you have set to keep the restored recovery point for. The Validation status is also marked as Timeout when the "Retention period before cleanup" is set to "Delete immediately", because the restored recovery point would be deleted immediately before we can perform the PutRestoreValidationResult API call.

Note that validation can be run programmatically but not from the AWS Backup console. [+] https://docs.aws.amazon.com/aws-backup/latest/devguide/restore-testing.html

Irrespective of the resource type - Aurora cluster,EC2 etc. The validation-status needs to be set manually after validating whether the restore worked or not.

For example, In my lab environment, I have configured 'Retention period before cleanup' (Under Assign resources section) as 'Retain for a specific number of hours' with 24 hours value. Further, when I have run the Restore plan test, I could notice the status of Validating to be 'VALIDATING' even after the restore is completed on EC2 recovery point. However, when I have run the PutRestoreValidationResult API call through CLI by passing the value for Validating Status as 'SUCCESSFUL', the validating status on the job changed to 'SUCCESSFUL'. This was the same scenario even when I have configured the Aurora recovery point for the restore plan testing.

You can resolve these "**Timed out**" status messages by performing the PutRestoreValidationResult API via AWS CLI or SDK. This API is not supported through the AWS Console. Once this API is run, the deletion of the of the restored data will begin deleting (to save on costs). To automate this, you can create a Lambda function that calls the PutRestoreValidationResult API and then integrate this with Amazon EventBridge (CloudWatch Events) and send back validation status so that it is reported in the restore job.

[+] https://aws.amazon.com/blogs/aws/automatic-restore-testing-and-validation-is-now-available-in-aws-backup/

[+] https://docs.aws.amazon.com/aws-backup/latest/devguide/eventbridge.html

AWS
Sahaj_B
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions