Enhancing your incident response readiness with SCRaM
This article explains how to use Simulated Conditions Response and Management (SCRaM) to enhance your incident response readiness. The article includes best practices and proactive activities that you can implement.
What is SCRaM?
SCRaM is an AWS Enterprise Support Program that's designed to evaluate and improve the readiness of AWS workloads and teams for specific scenarios. SCRaM uses customized tabletop exercises that are tailored for a selected workload to comprehensively test for disaster recovery and security incident preparedness. These scenarios include the following:
-
Availability Zone impairments
-
AWS Regional service disruptions
-
Networking issues
-
Distributed Denial of Service (DDoS) attacks
-
Ransomware situations
SCRaM's goal is to enhance the organization's overall incident response readiness through this targeted assessment and simulation approach, and identify opportunities for further improvement. Regular testing with SCRaM builds the procedures that your team need to protect, recover, and maintain business continuity when unexpected disruptions affect your workloads.
The cost of annual system downtime can be immense for companies, and often includes lost productivity, lost revenue, and potential damage to the company's reputation. While organizations strive to minimize downtime, system disruptions can occur because of a variety of factors. The most prepared companies take a proactive approach to address these factors. These companies make sure that they have the tools and processes in place to quickly detect, respond to, and recover from incidents that affect the availability of their systems and applications.
To prevent downtime, companies must focus on measures such as fault tolerance, redundancy, auto-scaling, security, and continuous monitoring. These companies must also have regular testing, create disaster recovery plans, and optimize performance so that the application can withstand failures and recover quickly when issues occur.
Note: SCRaM is currently available to AWS Enterprise Support customers who are under a nondisclosure agreement (NDA).
Shared responsibility model
Security and Compliance is a shared responsibility between AWS and the customer. AWS secures the infrastructure that runs all services (security "OF" the cloud), and the customer secures their data and applications (security "IN" the cloud). Resilience is also a shared responsibility between AWS and the customer. AWS is responsible for building reliable infrastructure, and the customer architects their applications to reliably respond to and recover from failures.
For your SCRaM engagement, it's important to understand your role and how AWS can help you improve what you're responsible for. To learn more about shared responsibility, see the AWS shared responsibility model and Disaster recovery of workloads on AWS: Recovery in the cloud.
Why should I use SCRaM?
SCRaM offers workload-focused tabletop exercises that test your incident response plan without disrupting operations. Unlike audits or sales presentations, these exercises prepare teams for disaster recovery and security events through realistic scenario walkthroughs. During sessions, teams identify specific gaps in processes, communication, and response strategies, and leaders refine procedures and strengthen cross-functional coordination. SCRaM exposes vulnerabilities, challenges assumptions, and clarifies stakeholder roles during crises, testing operational readiness, and building confidence in internal processes. After the exercise is complete, participants receive a SCRaM engagement report with a comprehensive action plan that contains targeted recommendations for improvement.
By learning exactly who to contact during critical situations, teams can build the confidence that they need to respond effectively during actual incidents, especially in unexpected circumstances. The combination of tabletop exercises and AWS services creates measurable improvements in operational resilience. With this resilience, organizations can reduce incident response times from hours to minutes, strengthen disaster recovery and security capabilities, and improve team collaboration across functions.
To complement SCRaM, AWS uses resilience services to programmatically run workload assessments against defined critical workloads. These services include:
-
AWS Resilience Hub
-
AWS Fault Injection Service
-
AWS Security Hub
-
Amazon GuardDuty
This comprehensive approach includes assessing your Recovery Point Objective (RPO) and Recovery Time Objectives (RTO) and testing your recovery procedures before actual incidents occur. AWS resilience services also enhance your protection and use machine learning to detect threats and provide a central location to monitor during testing.
How is a SCRaM engagement conducted?
SCRaM is a specialist-led engagement that consists of four in-person or virtual meetings. Your Technical Account Manager (TAM) assigns a SCRaM practitioner to directly work with your organization. The practitioner helps your organization complete the following actions:
-
Identify the tabletop requirements, such as whether you should focus on security or resilience.
-
Collect information on your selected workload.
-
Identify key stakeholders for engagement.
After the initial call to gather information, the practitioner works with you to schedule two working sessions.
Session 1
Session 1 is a discovery session where you review the selected workload's architecture and operations. This includes aspects of the Reliability or Security pillar of the AWS Well-Architected Framework.
Session 2
Equipped with the discovery data, the practitioner leads the organization through the tabletop exercises in Session 2. These exercises help you identify areas for improvement that arerelated to your organization's incident response plan.
At the conclusion of the SCRaM engagement, AWS provides and reviews a report. The report details areas where your organization follows industry standard best practices, and also identifies and prioritizes areas for improvement. You can then use this information to iterate and improve on your organizational and workload incident response readiness.
What best practices can I apply today?
Best practices vary depending on whether you're focused on improving the resiliency or security of your workload. The following are a few best practices from AWS Well-Architected Framework Review and the AWS Trusted Advisor tool.
Workload resiliency
Implement workload monitoring
Implement monitoring for workload components to detect failures. These components can help you monitor the health of your workload through automated systems so that you're instantly aware of failures or degradations. Use AWS Incident Detection and Response to proactively manage your workloads and detect technical failures. You can also use AWS Incident Detection and Response to monitor for Key Performance Indicators (KPIs) and align with the success metrics for the workload.
Automatically back up your data
Configure backups to be taken automatically based on a periodic schedule that conforms with your workload's RPO or changes in the dataset. For critical datasets with low data loss requirements, you must automatically back up the data on a frequent basis. Set up alert notifications for when an automated backup finishes or doesn't complete.
Implement a multi-Availability Zone architecture
To achieve a high-availability workload, implement a multi-Availability Zone architecture. This architecture provides resilience against infrastructure impairments within a single AWS Region, and keeps your workloads operational even during Availability Zone-specific impairments. If an impairment occurs in a single Availability Zone, then your application automatically redistributes traffic to healthy resources in other Availability Zones. That way, your application maintains business operations without manual intervention.
Track your application resiliency
Use the AWS Resilience Hub as a central place to define, validate, and track the resiliency of your AWS application. This service helps you protect your applications from disruptions, reduce recovery costs, and optimize business continuity while meeting compliance and regulatory requirements. The AWS Resilience Hub helps you prioritize resilience improvements based on business impact, making sure that critical applications maintain availability during disruptions while optimizing recovery costs.
To improve application resilience through controlled chaos engineering experiments, use the AWS Fault Injection Service. This fully managed service allows you to intentionally inject failures into your AWS workloads to test its resilience and identify weaknesses before it can affect your application.
Security enhancement practices
Encrypt your data
It's a critical security measure to enforce encryption of data at rest and in transit. By encrypting sensitive resources, such as Amazon Relational Database Service (Amazon RDS) storage at rest, you maintain confidentiality. You also establish an additional defensive layer against unauthorized data disclosure or exfiltration threats, such as ransomware.
Additionally, it's a vital part of your risk management strategy to conduct comprehensive inventories and maintain strict control over unencrypted data. This practice allows you to identify vulnerable information assets and apply appropriate protection measures, significantly reducing the potential effects of data exposure incidents.
Implement automatic compliance checking
Use AWS services to implement automated compliance checking and security scanning and response. This can include the following services:
-
AWS Config to continuously assess, audit, and evaluate the configurations of AWS resources.
-
Security Hub for a comprehensive view of your security state across AWS accounts and resources.
-
Amazon GuardDuty to continuously monitor for malicious activity and unauthorized behavior across your AWS environment.
-
Amazon Detective to automatically collect log data from AWS resources and investigate and identify the root cause of potential security issues or suspicious activities.
-
Amazon Inspector to continuously validate your security posture against industry standards.
-
AWS Security Incident Response to integrate automated detection and investigation capabilities with streamlined communication channels, while providing access to AWS specialized Customer Incident Response Team (CIRT).
Deploying these services can provide real-time visibility into your security posture, automate compliance validation, and allow rapid response to potential security issues.
Implement least privilege access
Implement a comprehensive least privilege access control framework across your AWS environment. To implement this framework, create finely tuned IAM roles and policies precisely matched to specific job functions and service requirements. Enforce security guardrails through a defense in depth strategy that minimizes potential attack surfaces while maintaining operational efficiency. Also, enforce basic practices, such as AWS Identity and Access Management (IAM) access key rotation, IAM password policy, and multi-factor authentication (MFA) on your root account.
Systematically analyze required access patterns for each role and service to create narrowly scoped policies that grant only the minimum permissions necessary to perform designated tasks. Use AWS IAM Access Analyzer to generate and refine least privilege policies based on actual access patterns, and regularly review permissions to identify and remove unused access rights.
Am I charged for SCRaM?
SCRaM is included as an AWS Enterprise Support proactive engagement. It comprises a workload and process analysis, and continued engagement with your assigned TAM and AWS account team. Customers aren't limited to the number of SCRaMs that they can run, and are encouraged to have recurring engagements for critical workloads. These recurring engagements help identify changes that you might need, and make sure that your workload response is up to par.
Conclusion
SCRaM is a powerful tool to help improve your organizational and workload incident response readiness. If you're an Enterprise Support customer, then contact your account team to schedule a SCRaM engagement.
If you aren't an existing Enterprise Support customer, then contact your account manager to subscribe to Enterprise Support and learn more about proactive engagements, including SCRaM. To learn more, see AWS Enterprise Support.
About the authors
Rodney Underkoffler
Rodney Underkoffler is a Senior Solutions Architect at AWS who focuses on guiding Enterprise customers in their cloud journey. He has a background in infrastructure, security, and IT business practices. He's passionate about technology, and enjoys building and exploring new solutions and methodologies.
Renato Gentil
Renato Gentil is a Senior TAM based in Ireland with over 7 years of experience with AWS. Renato has four AWS certifications, and has been working on large-scale resilience projects with different customers around the globe. Through this work, he helps them improve their resilience posture with tabletop exercises and incident management process.
Russell Sprague
Russell Sprague is a Principal TAM at AWS and is passionate about helping customers with their operational excellence and resiliency needs. Outside of work, he loves hanging out on the porch with his wife, playing games with their kids, hiking with their dogs, and dressing up like a pirate.
Vijay Sitaram
Vijay Sitaram is a Principal TAM at AWS who's customer obsessed and specializes in resiliency and operational excellence. He's also an expert in SAP on AWS workloads. Outside of work, he's active in the world of fitness through CrossFit and weightlifting.
Abdul Khan
Abdul Khan is a TAM at AWS. He has a cyber security background and has been helping customers improve their security and resiliency posture through SCRaM engagements for the past several years.
- Language
- English

Relevant content
- asked 4 years ago