Skip to content

How do I monitor my Amazon Managed Prometheus (AMP) environment?

5 minute read
Content level: Advanced
0

This article explores strategies for monitoring your Amazon Managed Prometheus workspaces using CloudWatch metrics and best practices.

Why Monitor Amazon Managed Prometheus?

Amazon Managed Prometheus (AMP) is a fully managed, secure, and highly available service that makes it easy to monitor applications at scale. But who monitors the monitor? For Observability teams, understanding the health, performance, and usage patterns of AMP itself is crucial for maintaining reliable monitoring infrastructure.

Before diving into the how, let's understand the why. Monitoring AMP helps you:

  • Optimize Costs: Track ingestion rates and active series to manage expenses
  • Ensure Reliability: Detect issues before they impact your observability stack
  • Plan Capacity: Understand usage trends for scaling decisions
  • Maintain Performance: Identify query patterns and optimize accordingly
  • Demonstrate Value: Show stakeholder metrics on monitoring infrastructure ROI

Understanding AMP CloudWatch Metrics

Amazon Managed Prometheus automatically publishes usage metrics to CloudWatch under the AWS/Prometheus namespace. These metrics provide insights into:

1. Ingestion Metrics

  • ActiveSeries: Number of unique time series with recent samples
  • IngestedSamples: Rate of metric samples being ingested
  • IngestionRate: Overall data ingestion throughput

2. Storage Metrics

  • StorageUsage: Total storage consumed by your workspace
  • TimeSeriesCount: Total number of time series stored

3. Query Performance Metrics

  • QueryExecutionTime: Duration of PromQL query execution
  • QuerySamplesScanned: Number of samples processed per query
  • ConcurrentQueries: Number of simultaneous queries being executed

Setting Up the Monitoring

Step 1: Create a CloudWatch Dashboard

Let's start by creating a dedicated dashboard for AMP monitoring:

{
  "name": "AMP-Monitoring-Dashboard",
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "metrics": [
          ["AWS/Prometheus", "ActiveSeries", {"stat": "Average"}],
          [".", "IngestedSamples", {"stat": "Sum"}],
          [".", "StorageUsage", {"stat": "Average"}]
        ],
        "period": 300,
        "stat": "Average",
        "region": "us-east-1",
        "title": "AMP Core Metrics"
      }
    }
  ]
}

Step 2: Configure CloudWatch Alarms

Set up proactive alerting for critical thresholds:

High Ingestion Rate Alarm

AlarmName: AMP-HighIngestionRate
MetricName: IngestedSamples
Statistic: Sum
Period: 300
EvaluationPeriods: 2
Threshold: 1000000000  # 1 billion samples
ComparisonOperator: GreaterThanThreshold

Active Series Growth Alarm

AlarmName: AMP-ActiveSeriesGrowth
MetricName: ActiveSeries
Statistic: Average
Period: 3600
EvaluationPeriods: 1
Threshold: 10000000  # 10 million series
ComparisonOperator: GreaterThanThreshold

Step 3: Implement Cost Monitoring

Track AMP-related costs using these key metrics:

  1. Ingestion Costs: Monitor IngestedSamples metric
  2. Storage Costs: Track StorageUsage over time
  3. Query Costs: Analyze QuerySamplesScanned patterns

Create a cost optimization dashboard:

# Example: Calculate estimated monthly costs
ingestion_rate_per_hour = cloudwatch.get_metric_statistics(
    Namespace='AWS/Prometheus',
    MetricName='IngestedSamples',
    Statistics=['Sum'],
    Period=3600
)

# AMP pricing: $0.90 per 10 million samples
estimated_monthly_cost = (ingestion_rate_per_hour * 24 * 30) / 10_000_000 * 0.90

Best Practices for AMP Monitoring

1. Implement Tiered Alerting

Create multiple alert severity levels:

  • Critical: Service impacting issues (>95% of limits)
  • Warning: Approaching thresholds (>80% of limits)
  • Info: Notable changes in patterns (>50% increase)

2. Use Metric Math for Advanced Insights

Combine metrics for deeper analysis:

# Samples per active series (efficiency metric)
m1 = AWS/Prometheus IngestedSamples
m2 = AWS/Prometheus ActiveSeries
Expression: m1 / m2

3. Monitor Query Patterns

Track query performance to optimize PromQL:

  • Identify expensive queries using QuerySamplesScanned
  • Monitor QueryExecutionTime percentiles (p50, p90, p99)
  • Track ConcurrentQueries to understand load patterns

4. Automate Response Actions

Use CloudWatch Events and Lambda for automated remediation:

def lambda_handler(event, context):
    # Auto-scale AMP workspace based on metrics
    if event['metric'] == 'ActiveSeries' and event['value'] > threshold:
        # Implement workspace scaling logic
        scale_amp_workspace()
    
    # Alert on-call team for critical issues
    if event['severity'] == 'CRITICAL':
        send_pagerduty_alert(event)

Advanced monitoring strategies

Cross-Region Monitoring

For multi-region AMP deployments:

  1. Aggregate metrics across regions using CloudWatch cross-region dashboards
  2. Implement global alerting with centralized SNS topics
  3. Use CloudWatch Contributor Insights for regional comparison

Integration with AWS Organizations

For enterprise-scale monitoring:

  • Use CloudWatch cross-account observability
  • Implement organizational CloudWatch dashboards
  • Create consolidated billing alerts for AMP costs

Custom Metrics Enhancement

Supplement CloudWatch metrics with custom instrumentation:

import boto3
from prometheus_client import Counter, Histogram

# Custom metrics for AMP client operations
amp_write_duration = Histogram('amp_write_duration_seconds', 
                               'Time to write to AMP')
amp_query_errors = Counter('amp_query_errors_total', 
                           'Total AMP query errors')

@amp_write_duration.time()
def write_to_amp(samples):
    # Your AMP write logic
    pass

Conclusion

Monitoring Amazon Managed Prometheus is essential for maintaining a robust observability platform. By leveraging CloudWatch metrics, implementing proactive alerting, and following best practices, you can ensure your AMP workspaces operate efficiently and cost-effectively.

Remember that effective monitoring is an iterative process. Start with the basics, gradually add sophistication based on your needs, and continuously refine your approach based on operational insights.

Next Steps

  1. Review your current AMP monitoring setup against these recommendations
  2. Implement missing CloudWatch alarms for critical metrics
  3. Create a cost optimization dashboard for your AMP workspaces
  4. Document your monitoring runbooks and escalation procedures
  5. Schedule regular reviews of your AMP usage patterns and costs

Additional Resources


Have questions or want to share your AMP monitoring strategies? Join the discussion in the AWS Observability community forums or reach out to your AWS account team for personalized guidance.