CloudWatch memory usage alert triggers, but metrics show no corresponding event

0

We set up an alert to trigger when our ECS containers breach 95% of memory usage. On one instance this alert triggers now multiple times a day, even though the metrics show a pretty stable utilization around 73%. No spikes and no missing data.

Here is the data from today (April 24, 2024) with alerts around 17:30, but no corresponding spike in the above metric: Alarm overview 2024-04-24

And here the view in the metrics, showing all relevant items (memory reserved, memory utilized and the percentage calculation), but no events around 17:30: Metrics overview 2024-04-24

We need guidance on what is going on here. This alert is currently absolutely useless as it triggers without an actual problem.

Thanks

已提问 24 天前131 查看次数
2 回答
6

I would like to suggest some changes to resolve this issue: -

1.Verify Alert Configuration: Check that the alert threshold is correctly set at 95% and targets the right ECS containers.

2.Confirm Metric Accuracy: Double-check the metric data in CloudWatch Metrics to ensure it accurately reflects memory usage.

3.Review ECS Setup: Check ECS container configurations and investigate for any memory-intensive tasks or issues.

4.Monitor Alarm State Changes: Look into CloudWatch Alarm History for patterns in alarm triggers.

5.Adjust Alert Actions: Review and adjust alert actions if necessary, ensuring they are appropriate.

go through with documents: - https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#common-features-of-alarms

profile picture
已回答 23 天前
  • I double and triple-checked the whole setup and all looks correct. We have the same setup for a list of other ECS instances and we have this problem only for this ECS. It might have been temporary, because it hasn't happen since.

1
已接受的回答

Hello.

What is the setting for CloudWatch Alarm's treat missing data?
Depending on the contents of this setting, an alarm may occur even if the metrics appear normal.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data

You may also be able to check the reason for the alarm by looking at the CloudWatch Alarm history.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#common-features-of-alarms

profile picture
专家
已回答 24 天前
  • It is set to treat missing data as missing, but we are evaluating percentiles with low samples. So I will take a look at that setting

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则