DR/DC for RDS MS SQL Server

0

Hi AWS, can anyone help me with the DR/DC guide for RDS SQL Server. What should be the ideal RTO and RPO for the same? Also how to achieve zero downtime in case the primary instance goes down or the cluster itself crashes due to any unforeseen consequences. Please guide.

2 Answers
0

RTO and RPO should be driven by business requirements and not by the platform used to deliver the solution. If the chosen platform cannot deliver a solution that meets the RTO and RPO demanded by the business then the platform isn't suitable.

This blog post goes over the many of the factors to keep in mind when determining RTO and RPO https://aws.amazon.com/blogs/mt/establishing-rpo-and-rto-targets-for-cloud-applications/

In your question you ask about an ideal RTO and RPO for SQL Server - it depends how you set it up. Suppose you have single RDS for SQL Server instance that is backed up daily at midnight, potentially this means that you could have an event where the database is lost at 11:59pm the next evening, and you will have lost 24 hours of data since the last backup, i.e. the RPO is 24 hours. If it takes your support staff (DBAs etc.) until 9am the next morning to provision a replacement RDS instance and restore from backup, then your RTO is 9 hours.

If your business can tolerate that then great. If it's intolerable to the business then consider how you could shrink the RTO and RPO. Taking backups twice daily would reduce the RPO to 12 hours, or hourly backups would reduce the RPO to one hour, but if you take backups too frequently it may impact performance, so you may want to look at having a read replica.

Having a warm standby already running (so all you have to do is recover the last backup to it) would reduce your RTO, but would increase your costs as you need to have it up and running. So you may consider provisioning this through IaC (CloudFormation, Terraform, etc.) so it can be spun up quickly when it is needed.

None of these will deliver zero downtime that you mention in your question, if you use RDS for SQL Server Multi-AZ with one standby then in the event that the primary is lost this should failover to a secondary within one minute https://aws.amazon.com/rds/features/multi-az/#Amazon_RDS_Multi-AZ_with_one_standby

RDS for SQL Server cannot lessen this any further but other offerings can. Check out this comparison table for what's available on all the RDS offerings https://aws.amazon.com/rds/features/multi-az/#Comparison_Table

profile picture
EXPERT
Steve_M
answered 8 months ago
profile pictureAWS
EXPERT
reviewed 8 months ago
  • Hi, agreed: "RTO and RPO should be driven by business requirements and not by the platform used to deliver the solution" is very key. Start from business requirements and work backwards from them.

0

Hi, When you know your exact business requirements, I'd suggest to go to https://aws.amazon.com/rds/ha/ to see how RDS can help you achieve proper SLA with minimal effort.

Additionally, this blog post is full of wisdom (even if for RDS Oracle): https://aws.amazon.com/blogs/database/automate-high-availability-setup-in-amazon-rds-custom-for-oracle/

Best

Didier

profile pictureAWS
EXPERT
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions