mediaConnect-support-playbook

7 minute read
Content level: Advanced
0

The purpose of this document is to offer general guidance on how to troubleshoot the issues related to mediaConnect and troubleshooting

MediaConnect – Support Checklist

  • Complete description of the issue
  • Timeframe of the issue (whether it happened for specific timeframe or currently ongoing)
  • Frequency of occurrence (whether it has happened before or first time, intermittent or a single continued occurrence, etc.)
  • Complete flow architecture (like Src → Zixi → MediaConnect → MediaLive → MediaPackage → ..)
  • ARNs of all the resources involved (MediaConnect flow ARN, MediaLive/MediaPackage channel ARN, etc.)
  • Is this a new or existing workflow?
  • Confirmation on source health (whether you have checked if the source had any issues during the concerned time)
  • Investigation performed on your end (if any)

MediaConnect – Common Issue Investigation and Case Creation

MediaConnect flow maintenance schedule

Description(s) of the issue:

  • Request to delay or reschedule upcoming scheduled MediaConnect flow maintenance

Why does MediaConnect service undergo maintenance

  • MediaConnect flows undergo routine maintenance for service updates
  • This involves stopping and restarting the flow, impacting traffic
  • Maintenance windows are scheduled by MediaConnect and notified in advance - Times can vary, so monitor schedules via CloudWatch and Personal Health Dashboard (PHD)
  • At any point, if a maintenance schedule coincides with a live event, please reach out to AWS Support well in advance to get the schedule shifted or skipped for the duration of the event.

Data required from your end to delay the maintenance:

  • Original maintenance date/time from notification
  • Date, time and duration of the conflicting live event
  • Proposed new maintenance date/time that avoids the live event

Monitoring for Schedule Changes

  • Check notification sources like email or Personal Health (Dashboard)

  • Continue monitoring CloudWatch maintenance metrics

  • Confirm adjusted schedule in MediaConnect console

MediaConnect and Downstream data flow issue

Description(s) of the issue:

  • MediaLive channel input from MediaConnect goes blank/black for a period of time
  • MediaConnect stops delivering content to the MediaLive channel
  • Outage is observed between MediaConnect and the MediaLive channel

Troubleshooting steps to be performed

Data required for a support case:

  • Same as the data mentioned in MediaConnect – Support Checklist

Source transport protocol (SRT, Zixi, RTP-FEC, etc.) and MediaConnect data flow issue

Issue Description

  • SRT/Zixi source to MediaConnect flow intermittently breaking or failing

Troubleshooting steps

  • Step 1: Check Critical Source Metrics - Monitor the 4 key source metrics already listed in the MediaConnect à Downstream data flow issue scenario.
  • Step 2: Monitor Protocol-Specific Metrics - SRT - Zixi Push
  • Step 3: Analyse Source Performance Issues - SRT and Zixi support error correction which helps pinpoint problems - RIST and RTP lack error correction - Monitoring protocol-specific metrics in addition to critical source stats helps isolate issues. Sources with error correction like SRT and Zixi are preferred over RIST/RTP

Data required to escalate to the support team:

The general advice here is to escalate to the Networking Support Team when packet drops are seen at a flow’s source as per the instructions in the article below:

Since EMX is only reporting packet drops/CC errors etc at the source there is a nearly 100% chance that any case raised where the customer sees this is caused by a network path issue upstream of the flow (either in the customer’s own network or in the AWS network) which is out of EMX support control and visibility to diagnose. mediaconnect-network-troubleshooting

  • MediaConnect state change event notifications

    • Description(s) of the issue:

      • Configuring logging and notifications for MediaConnect flow alerts
    • Monitoring Options in MediaConnect

      • MediaConnect does not provide advanced customer logging
      • However, CloudWatch Events can be used to trigger notifications
  • Setting Up Notifications - MediaConnect can generate events for state changes, alerts etc - These events can be used to trigger other services. Some of them listed below :

       - Invoke Lambda functions
       - Notify SNS topics or SQS queues
       - Trigger Step Functions state machines
    
  • However, we do have CloudWatch events that you can set up to get notified in case of any change in the state of your MediaConnect resources as well as process these notifications to perform further remedial actions.

  • You can trigger the following actions to get notified of these state changes and process them further: - Invoking an AWS Lambda function - Invoking Amazon EC2 Run Command - Relaying the event to Amazon Kinesis Data Streams - Activating an AWS Step Functions state machine - Notifying an Amazon SNS topic or an Amazon SQS queue

Some important notifications that could be configured


Example of a notification structure

  • MediaConnect failover

  • Description(s) of the issue: - Understanding and managing source failovers in MediaConnect flows

  • How failovers work - MediaConnect randomly uses one of the sources to provide content for the flow if no primary source is specified - The flow switches to the other source if the primary source does not send data for 500 milliseconds, and switches back to the primary source as soon as data returns. - However, in case both flows go down simultaneously or fluctuate intermittently considerably fast, at one point in time, it can cause a stream failure and complete loss of data at the downstream. This would be notified in CloudWatch critical alert events.

  • Best Practices - Maintain replication across sources to avoid dual failure - Failovers can cause brief outage risks if sources fluctuate

  • Monitoring Failovers - CloudWatch critical alerts notify of stream failures - Source Health metric "FailoverSwitches" can help in tracking switches

MediaConnect quotas and limit increases

  • Description(s) of the issue: - Request to increase a MediaConnect quota limit
  • Default Quota Limits - Number of flows per region can be increased after review - All other quotas like API limits are fixed
  • Increasing Flow Quota Limit - Contact AWS Support - Provide detailed use case requiring more than 20 flows - Support will review and may approve a higher limit
  • Handling API Limit Quotas - API request limits are "steady state" 5/min and "burst" 30 - These limits cannot be increased - Optimize workflows to avoid breaching limits - Implement exponential backoff for API retries
  • If API limits are consistently exceeded - Analyse workflows and API usage patterns - Explore alternative architectures or optimizations Details

**Special mention to our MediaConnect SME's Naveen Kumar Jindal, Kartik Kapoor and Ruhisar Tikoo in putting this content together.

profile pictureAWS
EXPERT
published a month ago930 views