Skip to content

How do I monitor my Amazon Managed Prometheus (AMP) environment?

4 minute read
Content level: Advanced
0

This article explores strategies for monitoring your Amazon Managed Prometheus workspaces using CloudWatch metrics and best practices.

Why Monitor Amazon Managed Prometheus?

Amazon Managed Prometheus (AMP) is a fully managed, secure, and highly available service that makes it easy to monitor applications at scale. But who monitors the monitor? For Observability teams, understanding the health, performance, and usage patterns of AMP itself is crucial for maintaining reliable monitoring infrastructure.

Before diving into the how, let's understand the why. Monitoring AMP helps you:

  • Optimize Costs: Track ingestion rates and active series to manage expenses
  • Ensure Reliability: Detect issues before they impact your observability stack
  • Plan Capacity: Understand usage trends for scaling decisions
  • Maintain Performance: Identify query patterns and optimize accordingly
  • Demonstrate Value: Show stakeholder metrics on monitoring infrastructure ROI

Understanding AMP CloudWatch Metrics

Amazon Managed Prometheus automatically publishes usage metrics to CloudWatch under the AWS/Usage and AWS/Prometheus namespace. These metrics provide insights into:

1. Ingestion Metrics

  • ActiveSeries: Number of unique time series with recent samples
  • DiscardedSamples: Number of discarded samples by reason
  • IngestionRate: Overall data ingestion throughput

2. Query Performance Metrics

  • QueryMetricsTPS: Query operations per second
  • QuerySamplesProcessed: Number of samples processed per query

Setting Up the Monitoring

Step 1: Create a CloudWatch Dashboard

Let's start by creating a dedicated dashboard for AMP monitoring:

{
  "name": "AMP-Monitoring-Dashboard",
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "metrics": [
          ["AWS/Usage", "ActiveSeries", {"stat": "Average"}],
          ["AWS/Prometheus", "DiscardedSamples", {"stat": "Sum"}],
          ["AWS/Usage", "IngestionRate", {"stat": "Average"}]
        ],
        "period": 300,
        "stat": "Average",
        "region": "us-east-1",
        "title": "AMP Core Metrics"
      }
    }
  ]
}

Step 2: Configure CloudWatch Alarms

Set up proactive alerting for critical thresholds:

High Ingestion Rate Alarm

AlarmName: AMP-HighIngestionRate
MetricName: IngestionRate
Statistic: Sum
Period: 300
EvaluationPeriods: 2
Threshold: 1000000000  # 1 billion samples
ComparisonOperator: GreaterThanThreshold

Active Series Growth Alarm

AlarmName: AMP-ActiveSeriesGrowth
MetricName: ActiveSeries
Statistic: Average
Period: 3600
EvaluationPeriods: 1
Threshold: 10000000  # 10 million series
ComparisonOperator: GreaterThanThreshold

Best Practices for AMP Monitoring

1. Implement Tiered Alerting

Create multiple alert severity levels:

  • Critical: Service impacting issues (>95% of limits)
  • Warning: Approaching thresholds (>80% of limits)
  • Info: Notable changes in patterns (>50% increase)

Advanced monitoring strategies

Cross-Region Monitoring

For multi-region AMP deployments:

  1. Aggregate metrics across regions using CloudWatch cross-region dashboards
  2. Implement global alerting with centralized SNS topics
  3. Use CloudWatch Contributor Insights for regional comparison

Integration with AWS Organizations

For enterprise-scale monitoring:

  • Use CloudWatch cross-account observability
  • Implement organizational CloudWatch dashboards
  • Create consolidated billing alerts for AMP costs

Custom Metrics Enhancement

Supplement CloudWatch metrics with custom instrumentation:

import boto3
from prometheus_client import Counter, Histogram

# Custom metrics for AMP client operations
amp_write_duration = Histogram('amp_write_duration_seconds', 
                               'Time to write to AMP')
amp_query_errors = Counter('amp_query_errors_total', 
                           'Total AMP query errors')

@amp_write_duration.time()
def write_to_amp(samples):
    # Your AMP write logic
    pass

Conclusion

Monitoring Amazon Managed Prometheus is essential for maintaining a robust observability platform. By leveraging CloudWatch metrics, implementing proactive alerting, and following best practices, you can ensure your AMP workspaces operate efficiently and cost-effectively.

Remember that effective monitoring is an iterative process. Start with the basics, gradually add sophistication based on your needs, and continuously refine your approach based on operational insights.

Next Steps

  1. Review your current AMP monitoring setup against these recommendations
  2. Implement missing CloudWatch alarms for critical metrics
  3. Create a cost optimization dashboard for your AMP workspaces
  4. Document your monitoring runbooks and escalation procedures
  5. Schedule regular reviews of your AMP usage patterns and costs

Additional Resources


Have questions or want to share your AMP monitoring strategies? Join the discussion in the AWS Observability community forums or reach out to your AWS account team for personalized guidance.