Al usar AWS re:Post, aceptas las AWS re:Post Términos de uso

Developing a high-specificity AWS Health event strategy with AWS User Notifications

10 minutos de lectura
Nivel de contenido: Avanzado
1

This article deep dives using the properties of AWS Health events, advanced filtering with event patterns, and real-world filter examples you can use with AWS User Notifications

AWS Health provides insight into the performance and the availability the AWS services you use. You want to use the information contained in Health events to keep your teams informed of changes affecting their resources, but you need to ensure you’re providing only the most relevant information to them so they can focus on what’s important. To deliver that targeted information with AWS User Notifications, we’re going to explore the powerful filtering options available for matching Health events and real-world use-cases for highly specific filters so you can start creating these for your teams.

Getting started

AWS User Notifications allows you to create quick and repeatable configurations to manage notifications for your teams. By adding configurations for different recipients with customized filters and choosing different delivery options (email, console/app, chat channels) and the appropriate aggregation settings, you can highlight specific notifications while reducing the emphasis on others.

If you haven’t created notifications before, refer to Creating your first notification configuration in AWS User Notifications.

Overview of the AWS Health event schema

To filter your events, you use an event pattern. An event pattern lists the fields and values you want to match against in the source event. Using this snippet of an AWS Health event, lets observe the available fields we can use when filtering events.

The following example Health event in JSON format shows top level fields indicating the source, detail-type and resources of the event. The detail object in the event contains information about the event indicating which service it relates to, what category (issue, scheduledChange, accountNotification) and specific type of event it is. Also included are account, eventRegion and status fields. Each of these are available to use when deciding if you want a filter to match the event and then use it to send a notification.

To understand more about the purpose of each field refer to the AWS Health Events Amazon EventBridge Schema

{
    "version": "0",
    "id": "6b4676f6-a16a-5532-3178-fc7bbda2a6f9",
    "detail-type": "AWS Health Event",
    "source": "aws.health",
    "account": "685569933008",
    "time": "2024-09-04T19:11:59Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventArn": "arn:aws:health:us-east-1::event/LAMBDA/AWS_LAMBDA_INVOKE_ISSUE/AWS_LAMBDA_INVOKE_ISSUE_844eec7a-8ccc-5cc1-83c6-bffbf55e3fdf",
        "service": "LAMBDA",
        "eventTypeCode": "AWS_LAMBDA_INVOKE_ISSUE",
        "eventTypeCategory": "issue",
        "eventScopeCode": "ACCOUNT_SPECIFIC",
        "communicationId": "9d958ea13ecd72fddea5d57b5367b968bb798f45-1",
        "startTime": "Wed, 4 Sep 2024 16:46:00 GMT",
        "endTime": "Wed, 4 Sep 2024 17:53:00 GMT",
        "lastUpdatedTime": "Wed, 4 Sep 2024 18:31:58 GMT",
        "statusCode": "closed",
        "eventRegion": "us-east-1",
        "eventDescription": [{
            "language": "en_US",
            "latestDescription": "[11:13 AM PDT] We are investigating increased invoke error rates in the US-EAST-1 Region.\n\n[11:31 AM PDT] Between 9:46 AM and 10:53 AM PDT, we experienced increased invoke error rates in the US-EAST-1 Region.  Our engineering teams were automatically engaged to investigate this issue at 9:58 AM. At 10:14 AM we had identified the root cause of the issue and completed the deployment of mitigations at 10:28 AM, with full recovery observed at 10:53 AM. We can confirm that during this issue, retries would have succeeded. The issue has been resolved and the service is operating normally."
        }],
        "affectedEntities": [],
        "affectedAccount": "048122297024",
        "page": "1",
        "totalPages": "1"
    }
}

Filter (event pattern) syntax and evaluation

Now, let’s walk through rule evaluation. An event pattern matches when each field meets the criteria you specify.

For example, the following event pattern example only matches events where the source EQUALS aws.health.

{
    "source": ["aws.health"]
}

When you include additional fields, the pattern only matches when all fields match.

This pattern matches events where source EQUALS aws.health AND service EQUALS EC2.

{
    "source": ["aws.health"],
    "detail.service": ["EC2"]
}

A Health event for the S3 service does not match this filter. Within a single field, adding values results in a match for any listed value.

This pattern matches events where source EQUALS aws.health AND (service EQUALS EC2 OR service EQUALS S3)

{
    "source": ["aws.health"],
    "detail.service": ["EC2", "S3"]
}

When you have AWS Health Organizational View enabled, or if you forward events to custom event buses in other AWS accounts, use the affectedAccounts field to match only the accounts you want to monitor.

This pattern matches events where source EQUALS aws.health AND (service EQUALS EC2 OR service EQUALS S3) AND the affectedAccount EQUALS 123456789012. Just like adding more services to that list of values, you can match additional account numbers here.

{
    "source": ["aws.health"],
    "detail.service": ["EC2", "S3"],
    "detail.affectedAccount": ["123456789012"]
}

If you want to instead exclude events with certain values, use the anything-but operator.

This event adds a field which only matches events where the eventTypeCode is not AWS_CLOUDSHELL_PERSISTENCE_EXPIRING (all other events for the CloudShell service match the filter)

{
    "source": ["aws.health"],
    "detail.service": ["CLOUDSHELL"],
    "detail.eventTypeCode": [{ "anything-but": ["AWS_CLOUDSHELL_PERSISTENCE_EXPIRING"]}]
}

Deciding on recipients for targeted notifications

Depending on the size of your organization, the number of accounts and resources you use, today you could be the single person using AWS Health. Even as a sole user, you will want to choose the information you receive and your preferred way of receiving it. As your organization grows, you’ll need a plan for distributing notifications to the most relevant receivers to provide them useful information while avoiding overloading them.

Here are some common ways to approach filtering Heath events and delivering notifications to allow teams to focus on their particular responsibilities.

Consolidated responsibility

Application or service-aligned teams

For a team responsible for a distinct application in your environment, they want to focus on their specific AWS service dependencies. For instance a team running a web application on EC2 with a database backend on RDS can use this filter to match events.

{
    "source": ["aws.health"],
    "detail.service": ["EC2", "RDS"]    
}

Distributed roles

Event / Incident Management teams

For teams concerned with identifying the cause for an ongoing issue, they also need information on their dependencies. Use the event type category “Issue” to match events related to ongoing (or recently resolved) events with AWS services and resources. During an event, you want awareness of both the PUBLIC and ACCOUNT_SPECIFIC events to help correlate AWS service degradation or resource-specific issues with your own observability tools. The Receive within 5 minutes aggregation setting is recommended for near real-time notifications for these events unless you , in which case use the Do not aggregate setting.

{
    "source": ["aws.health"],
    "detail.eventTypeCategory": ["issue"],
    "detail.eventScopeCode": ["PUBLIC", "ACCOUNT_SPECIFIC"]
}

This example matches events where the event category is issue AND the event scope code is PUBLIC OR ACCOUNT_SPECIFIC

If you have separate teams monitoring per application, add the AWS service dependencies for it to the filter. You may also include “MULTIPLE_SERVICES” within the list, representing service events where more than one service reports related issues. (MULTIPLE_SERVICES is specific to the issue category of events.)

{
    "source": ["aws.health"],
    "detail.eventTypeCategory": ["issue"],
    "detail.eventScopeCode": ["PUBLIC", "ACCOUNT_SPECIFIC"],
    "detail.service": ["S3", "ECS", "CLOUDWATCH", "MULTIPLE_SERVICES"]
}

IT Operations

During normal conditions, your operations teams follow runbooks to manage and maintain individual resources. When common maintenance activities are required, your team needs to know of pending actions. Scheduled change events are account specific (include the account number) and resource specific (include the resource ID or ARN). Account notification events. Recommendation: Use the Receive within 5 minutes aggregation setting alongside this filter to receive near real-time notification but still consolidating notifications for sets of resources or accounts.

{
    "source": ["aws.health"],
    "detail.service": ["EC2"],
    "detail.eventTypeCategory": ["scheduledChange", "accountNotification"]
}

Change Management, Platform Owners

A subset of scheduled changes, planned lifecycle events require specific action from you to avoid impact like upgrading versions of EKS clusters or RDS databases. Your teams responsible for planning and tracking these events need updates for long-term planning around these events. Planned lifecycle events can contain multiple pages of resources, but for a human-readable notification, only the first page is necessary. Use the Receive within 12 hours aggregation setting alongside this filter to further reduce the number of notifications.

{
    "source": ["aws.health"],
    "detail.service": ["EKS"],
    "detail.eventTypeCategory": ["scheduledChange"],
    "detail.eventTypeCode": [{"suffix": "_PLANNED_LIFECYCLE_EVENT" }],
    "page": ["1"]  
}

Real world examples of high-specificity filters

The following scenarios and filters are based on real world use cases requested by AWS customers. Consider if these examples relate to challenges you experience and you can use similar patterns with your Health events.

Scenario 1: Send a notification for AWS Certificate Manager (ACM) renewal state change only on when intervention is required.

Your team managing web applications wants notifications for certificate renewal events. For the eventTypeCode AWS_ACM_RENEWAL_STATE_CHANGE (the certificate has been renewed, has expired, or is due to expire) and alert only when their certificate renewal is not successful. You note the text in the event description “AWS Certificate Manager (ACM) was unable to renew the certificate automatically using DNS validation. You must take action to ensure that the renewal can be completed.”

{
  "source": ["aws.health"],
  "detail": {
    "service": ["ACM"],
    "eventTypeCategory": ["scheduledChange"],
    "eventTypeCode": ["AWS_ACM_RENEWAL_STATE_CHANGE"],
    "eventDescription.latestDescription": [{
      "wildcard": "*AWS Certificate Manager (ACM) was unable to renew the certificate*"
    }]
  }
}

Scenario 2: Reduce notifications for AWS Virtual Private Network (VPN) events when using intentional single tunnel configurations

A team responsible for site-to-site connectivity using AWS VPN made a conscious decision to configure certain testing/nonproduction VPNs with only a single tunnel. They want to ignore AWS_VPN_SINGLE_TUNNEL_NOTIFICATION events for these sites but continue to receive notifications for other production VPNs.

By specifying the resource IDs they want to exclude they receiving notifications only for the actionable resources.

{
  "$or": [{
    "source": ["aws.health"],
    "detail.service": ["VPN"],
    "detail.eventTypeCode": [{ "anything-but": "AWS_VPN_SINGLE_TUNNEL_NOTIFICATION" }]
  }, {
    "source": ["aws.health"],
    "resources": [{ "anything-but": ["vpn-079ec7a4dd5cbb7fc", "vpn-0795cbb7fcec7a4dd"] }],
    "detail.service": ["VPN"],
    "detail.eventTypeCode": ["AWS_VPN_SINGLE_TUNNEL_NOTIFICATION"]
  }]
}

Note the $or operator. This example uses two separate patterns, the first for all VPN events except AWS_VPN_SINGLE_TUNNEL_NOTIFICATION and the second for only AWS_VPN_SINGLE_TUNNEL_NOTIFICATION when the resource is vpn-079ec7a4dd5cbb7fc or vpn-0795cbb7fcec7a4dd

Scenario 3: Direct Connect events as alerts or email messages

A team managing AWS Direct Connect for their business deployed their connections to allow for maximum uptime and to avoid issues caused by maintenance events. The team wanted to send alerts for events they considered most likely to affect data operations and have separate email delivery of the remaining event types.

Using this filter, the team created on configuration for high-priority alerts to the team related to events they wanted to investigate promptly.

{
  "source": ["aws.health"],
  "detail.eventTypeCode": [
    "AWS_DIRECTCONNECT_CONNECTIVITY_ISSUE",
    "AWS_DIRECTCONNECT_EMERGENCY_MAINTENANCE_SCHEDULED",
    "AWS_DIRECTCONNECT_LATENCY",
    "AWS_DIRECTCONNECT_MAC_FLAP_NOTIFICATION",
    "AWS_DIRECTCONNECT_OPERATIONAL_ISSUE",
    "AWS_DIRECTCONNECT_PACKET_LOSS"
  ]
}

Then using the opposite filter (anything-but those events), match all remaining eventTypeCodes in a second configuration for delivery via email.

{
  "source": ["aws.health"],
  "detail.eventTypeCode": [{
    "anything-but": [
      "AWS_DIRECTCONNECT_CONNECTIVITY_ISSUE",
      "AWS_DIRECTCONNECT_EMERGENCY_MAINTENANCE_SCHEDULED",
      "AWS_DIRECTCONNECT_LATENCY",
      "AWS_DIRECTCONNECT_MAC_FLAP_NOTIFICATION",
      "AWS_DIRECTCONNECT_OPERATIONAL_ISSUE",
      "AWS_DIRECTCONNECT_PACKET_LOSS"
    ]
  }]
}

In a scenario like this, if an event becomes more or less important to you for an alert vs email, add or remove it from the list to change the notification type.

Conclusion

These examples should give you an idea of how to implement specificity for AWS Health events in your environment. Using your AWS Health Dashboard, note the events you already receive in your accounts and think about how the information there can improve visibility, reduce troubleshooting time, and provide early notice of important changes so your teams can plan around them. To experiment with your events in your own accounts and event patterns like these, refer to Capturing your AWS Health events in JSON for reference in creating and testing filters and How to create and test AWS Health event advanced filters for AWS User Notifications