Gaining insight from support cases using generative AI
This article shows how organizations can use generative AI to transform their distributed AWS Support cases into a source of strategic insight. By uncovering patterns and root causes across accounts, you can proactively strengthen operations, reduce recurring issues, and improve overall system resilience.
Introduction
In the complex landscape of multi-account AWS environments, support cases reveal underlying patterns and insights that extend beyond the immediate technical concerns. Organizations typically find it difficult to get holistic insights from these cases because of their distribution across accounts. Organizations might tend to focus on symptoms rather than root causes. This reactive approach leads to recurring issues and missed opportunities for systemic improvement.
Challenges
When a production incident occurs, the organization immediately focuses on restoring the affected services. However, this urgency often means that we address the symptoms but don’t treat the underlying cause. We can consider the following common scenario:
Your critical application suddenly becomes unavailable because an Amazon Elastic Compute Cloud (Amazon EC2) instance can't communicate with an Amazon Relational Database Service (Amazon RDS) database. An investigation by AWS Support reveals a security group misconfiguration that blocked the necessary traffic. The immediate fix is simple. The fix requires updating the security group rules to restore the service.
However, this resolution misses the following crucial question: How did an improperly configured security group make it to production in the first place?
The real issue might have occurred upstream in your delivery pipeline. For example, your security group configurations might have been missing a validation step, there was no automated testing of connectivity, or you didn’t have a peer review to validate infrastructure changes before deployment. If the issue was caused by these systemic gaps, then you must address them. If you don't address these gaps, then similar incidents can recur with different manifestations. The challenge is to identify all possible issues over a distributed infrastructure.
Solution overview
You can use Case Insights, a serverless solution that automatically collects, processes, and analyzes your AWS support cases across multiple accounts within your organization. Through AWS services and generative AI, the system provides valuable insights into root causes and lifecycle improvement opportunities, helping your organization to shift from reactive firefighting to proactive governance. See the Case Insights solution on the GitHub website. With this custom solution, you can transform case data into actionable insights. Through the power of generative AI, specifically Amazon Bedrock with Claude 3.5 Haiku, you can automatically analyze support cases across all of your AWS accounts to identify patterns and derive meaningful insights.
The Case Insights solution works through three key steps:
- Case summarization: This step distills lengthy case communications into concise and structured summaries that capture essential details.
- Root cause analysis: This step categorizes each case into one of the root cause categories. The following table lists all the root cause categories:
- Lifecycle improvement: This step maps each case to one of the resilience lifecycle categories. This mapping helps identify specific areas of your software delivery lifecycle that can benefit from improvement. The following table shows the resilience lifecycle categories:
By aggregating these insights across your accounts, you can visualize trends over time and identify recurring patterns. For example, you might discover that 30% of your critical cases fall into the "Customer Release" root cause. This finding suggests that you must strengthen your release validation process. Or, you might find that "Load Testing" is your top lifecycle improvement opportunity, indicating that your performance testing practices are insufficient. By systematically categorizing cases through these frameworks, you can identify areas that need the most attention. This helps you build targeted improvements that prevent recurring issues.
Solution architecture
The Case Insights solution is available for you as an open-source project on the GitHub website. You can use this solution as a sample foundation to build upon. Be sure to test the solution and review the results before you use the solution for your production workloads. The solution uses a serverless architecture that's built on AWS Lambda, AWS Step Functions, Amazon Simple Query Service (Amazon SQS), Amazon Athena, and Amazon Simple Storage Service (Amazon S3). It also uses Amazon Bedrock for AI capabilities.
The solution can automatically do the following:
-
Retrieve active accounts from your organization in AWS Organizations
-
Collect support cases from each account
-
Process case communications
-
Analyze cases using generative AI
-
Store the results in a queryable format
The following image shows the architecture of this solution. The architecture is designed to be scalable and cost-effective with enhanced security features by default. The solution has appropriate back-off and retry logic to make sure that it handles failure. It uses AWS native security layers. To implement this solution for production workloads, check whether you need AWS Key Management Service (AWS KMS) for encryption.
The solution stores the analyzed data in Amazon S3 through a structure that's optimized for querying with Amazon Athena. This allows you to run complex analyses and generate reports without additional processing. You can manage the security of the accounts with AWS Identify Access and Management (IAM) roles that are scoped to least privilege. You can use Amazon API Gateway for IAM integration with existing applications.
Integrating Case Insights with your development workflow
The Case Insights solution includes a Model Context Protocol (MCP) server that integrates with AI-powered tools and IDEs. This server brings case analysis capabilities into your existing workflow through conversational interfaces.
The MCP server provides the following key capabilities:
-
Natural language queries: You can ask questions, such as "What are the top issues that affect our Amazon EC2 instances this month?" or "Show me the cases that are related to database connectivity problems.", directly within your IDE or AI assistant.
-
Contextual analysis: The server can analyze case patterns in the context of specific services, time periods, or root cause categories. It can then provide targeted insights for your current work.
-
Proactive recommendations: Based on historical case data, the system can suggest preventive measures and best practices that are relevant to the code or infrastructure that you're currently developing.
-
Integration with existing tools: The MCP server works with most AI development assistants and can be integrated into your existing toolchain, making case insights accessible where your team already works.
With tools such as Kiro, you can integrate your MCP client with the MCP server to explore your cases in plaintext, as shown in the following image:
In this example, we use a plaintext prompt on the Kiro UI "Could you give me a summary of my Development Issues for the last 3 months". Kiro uses the get_case_summary and analyze_case_summaries MCP applications to retrieve a summary from Athena. Then, Kiro sends this summary to Amazon Bedrock to receive a detailed summary. The models in Kiro understand this detailed summary and provide feedback to the user.
As shown in the following image, you can also ask Kiro for suggestions on how to screen the IAM policy in the AWS CloudFormation template. That way, you can implement a proactive check in the pipeline:
This integration transforms case insights into an active part of your development process, and helps developers access relevant historical context and avoid repeating past mistakes. It can also help operations or site reliability engineering (SRE) leaders to explore and understand past issues and improvement areas.
Case summaries can drive actionable insights. When data shows that you need better pipeline validation or comprehensive load testing, you can prioritize and implement these concrete improvement initiatives. You can deep-dive on the data at account level to understand the types of issues that your organization experiences, and then convert these issues into actionable steps for your application teams. By building a summary across the accounts in your organization, you can identify systemic issues that affect your organization’s resilience.
Solution implementation
To implement the Case Insights solution in your environment, see the detailed installation instructions and documentation on the GitHub website. The solution deploys with minimal setup through CloudFormation templates, and works with your existing AWS Organizations structure.
The deployment process includes two main components:
-
Core infrastructure: To start collecting and analyzing your support case data, deploy the serverless backend with the CloudFormation templates that the solution provides.
-
MCP server integration: Configure the included MCP server to connect your case insights with AI development tools and IDEs. This integration supports natural language queries and contextual analysis within your existing workflow.
Be sure to review the prerequisites and deployment guide in the repository. The solution includes sample queries, dashboards, and MCP configuration examples to help you extract insights from your support case data.
This solution serves as a foundational building block that you can extend and customize for your organization's needs. You can choose to do the following to adapt the framework to create a comprehensive problem analysis system:
-
Multi-cloud integration: Extend the data collection to include support cases from other cloud providers.
-
On-premises systems: Incorporate incident data from on-premises infrastructure management tools, IT Service Management (ITSM) services, or monitoring systems.
-
Third-party services: Include support cases from Software as a Service (SaaS) providers and external dependencies that impact your operations.
-
Internal ticketing: Integrate with internal help desk or incident management systems to create a unified view of operational issues.
-
Integration into development tooling: Use agents within the framework to analyze your cases and add insights and improvements into the backlog of development teams.
By centralizing problem analysis across your technology stack, you can identify cross-system patterns and systemic issues that might not be visible in isolation. The generative AI approach adapts to different data sources and creates a foundation for organization-wide operational intelligence.
For more information, see Amazon Bedrock documentation and AWS Well-Architected.
Conclusion
You can leverage generative AI to categorize and analyze cases across your AWS organization. This helps you gain visibility into patterns that might otherwise remain hidden.
The Case Insights solution that’s presented in this article delivers benefits such as identifying systemic issues before they cause widespread problems. You can also use this solution to provide data-driven insights to prioritize improvements, and transform support cases into a strategic asset for operational excellence. Instead of repeatedly addressing symptoms, you can focus on the root cause and build more resilient systems.
About the author
Brian Simpson-Adkins
Brian Simpson-Adkins is a Principal Technical Account Manager (TAM) in AWS Global Financial Services. He has over 20 years of experience in architecture, operations, and automation. Based in Australia, he partners with customers worldwide, and helps them resolve operational challenges through data-driven insights.
- Topics
- Generative AI on AWS
- Language
- English

Relevant content
- asked 7 months ago
- Accepted Answerasked a year ago