How do I monitor my Amazon Managed Prometheus (AMP) environment?
This article explores strategies for monitoring your Amazon Managed Prometheus workspaces using CloudWatch metrics and best practices.
Why Monitor Amazon Managed Prometheus?
Amazon Managed Prometheus (AMP) is a fully managed, secure, and highly available service that makes it easy to monitor applications at scale. But who monitors the monitor? For Observability teams, understanding the health, performance, and usage patterns of AMP itself is crucial for maintaining reliable monitoring infrastructure.
Before diving into the how, let's understand the why. Monitoring AMP helps you:
- Optimize Costs: Track ingestion rates and active series to manage expenses
- Ensure Reliability: Detect issues before they impact your observability stack
- Plan Capacity: Understand usage trends for scaling decisions
- Maintain Performance: Identify query patterns and optimize accordingly
- Demonstrate Value: Show stakeholder metrics on monitoring infrastructure ROI
Understanding AMP CloudWatch Metrics
Amazon Managed Prometheus automatically publishes usage metrics to CloudWatch under the AWS/Prometheus namespace. These metrics provide insights into:
1. Ingestion Metrics
- ActiveSeries: Number of unique time series with recent samples
- IngestedSamples: Rate of metric samples being ingested
- IngestionRate: Overall data ingestion throughput
2. Storage Metrics
- StorageUsage: Total storage consumed by your workspace
- TimeSeriesCount: Total number of time series stored
3. Query Performance Metrics
- QueryExecutionTime: Duration of PromQL query execution
- QuerySamplesScanned: Number of samples processed per query
- ConcurrentQueries: Number of simultaneous queries being executed
Setting Up the Monitoring
Step 1: Create a CloudWatch Dashboard
Let's start by creating a dedicated dashboard for AMP monitoring:
{ "name": "AMP-Monitoring-Dashboard", "widgets": [ { "type": "metric", "properties": { "metrics": [ ["AWS/Prometheus", "ActiveSeries", {"stat": "Average"}], [".", "IngestedSamples", {"stat": "Sum"}], [".", "StorageUsage", {"stat": "Average"}] ], "period": 300, "stat": "Average", "region": "us-east-1", "title": "AMP Core Metrics" } } ] }
Step 2: Configure CloudWatch Alarms
Set up proactive alerting for critical thresholds:
High Ingestion Rate Alarm
AlarmName: AMP-HighIngestionRate MetricName: IngestedSamples Statistic: Sum Period: 300 EvaluationPeriods: 2 Threshold: 1000000000 # 1 billion samples ComparisonOperator: GreaterThanThreshold
Active Series Growth Alarm
AlarmName: AMP-ActiveSeriesGrowth MetricName: ActiveSeries Statistic: Average Period: 3600 EvaluationPeriods: 1 Threshold: 10000000 # 10 million series ComparisonOperator: GreaterThanThreshold
Step 3: Implement Cost Monitoring
Track AMP-related costs using these key metrics:
- Ingestion Costs: Monitor
IngestedSamplesmetric - Storage Costs: Track
StorageUsageover time - Query Costs: Analyze
QuerySamplesScannedpatterns
Create a cost optimization dashboard:
# Example: Calculate estimated monthly costs ingestion_rate_per_hour = cloudwatch.get_metric_statistics( Namespace='AWS/Prometheus', MetricName='IngestedSamples', Statistics=['Sum'], Period=3600 ) # AMP pricing: $0.90 per 10 million samples estimated_monthly_cost = (ingestion_rate_per_hour * 24 * 30) / 10_000_000 * 0.90
Best Practices for AMP Monitoring
1. Implement Tiered Alerting
Create multiple alert severity levels:
- Critical: Service impacting issues (>95% of limits)
- Warning: Approaching thresholds (>80% of limits)
- Info: Notable changes in patterns (>50% increase)
2. Use Metric Math for Advanced Insights
Combine metrics for deeper analysis:
# Samples per active series (efficiency metric)
m1 = AWS/Prometheus IngestedSamples
m2 = AWS/Prometheus ActiveSeries
Expression: m1 / m2
3. Monitor Query Patterns
Track query performance to optimize PromQL:
- Identify expensive queries using
QuerySamplesScanned - Monitor
QueryExecutionTimepercentiles (p50, p90, p99) - Track
ConcurrentQueriesto understand load patterns
4. Automate Response Actions
Use CloudWatch Events and Lambda for automated remediation:
def lambda_handler(event, context): # Auto-scale AMP workspace based on metrics if event['metric'] == 'ActiveSeries' and event['value'] > threshold: # Implement workspace scaling logic scale_amp_workspace() # Alert on-call team for critical issues if event['severity'] == 'CRITICAL': send_pagerduty_alert(event)
Advanced monitoring strategies
Cross-Region Monitoring
For multi-region AMP deployments:
- Aggregate metrics across regions using CloudWatch cross-region dashboards
- Implement global alerting with centralized SNS topics
- Use CloudWatch Contributor Insights for regional comparison
Integration with AWS Organizations
For enterprise-scale monitoring:
- Use CloudWatch cross-account observability
- Implement organizational CloudWatch dashboards
- Create consolidated billing alerts for AMP costs
Custom Metrics Enhancement
Supplement CloudWatch metrics with custom instrumentation:
import boto3 from prometheus_client import Counter, Histogram # Custom metrics for AMP client operations amp_write_duration = Histogram('amp_write_duration_seconds', 'Time to write to AMP') amp_query_errors = Counter('amp_query_errors_total', 'Total AMP query errors') @amp_write_duration.time() def write_to_amp(samples): # Your AMP write logic pass
Conclusion
Monitoring Amazon Managed Prometheus is essential for maintaining a robust observability platform. By leveraging CloudWatch metrics, implementing proactive alerting, and following best practices, you can ensure your AMP workspaces operate efficiently and cost-effectively.
Remember that effective monitoring is an iterative process. Start with the basics, gradually add sophistication based on your needs, and continuously refine your approach based on operational insights.
Next Steps
- Review your current AMP monitoring setup against these recommendations
- Implement missing CloudWatch alarms for critical metrics
- Create a cost optimization dashboard for your AMP workspaces
- Document your monitoring runbooks and escalation procedures
- Schedule regular reviews of your AMP usage patterns and costs
Additional Resources
- Amazon Managed Prometheus User Guide
- AWS Observability Best Practices
- PromQL Optimization Guide
- CloudWatch Metrics Math
Have questions or want to share your AMP monitoring strategies? Join the discussion in the AWS Observability community forums or reach out to your AWS account team for personalized guidance.
- Language
- English
Relevant content
- asked 4 years ago
- Accepted Answerasked 2 years ago
AWS OFFICIALUpdated a year ago
AWS OFFICIALUpdated a year ago