CloudWatch memory usage alert triggers, but metrics show no corresponding event

0

We set up an alert to trigger when our ECS containers breach 95% of memory usage. On one instance this alert triggers now multiple times a day, even though the metrics show a pretty stable utilization around 73%. No spikes and no missing data.

Here is the data from today (April 24, 2024) with alerts around 17:30, but no corresponding spike in the above metric: Alarm overview 2024-04-24

And here the view in the metrics, showing all relevant items (memory reserved, memory utilized and the percentage calculation), but no events around 17:30: Metrics overview 2024-04-24

We need guidance on what is going on here. This alert is currently absolutely useless as it triggers without an actual problem.

Thanks

feita há 24 dias131 visualizações
2 Respostas
6

I would like to suggest some changes to resolve this issue: -

1.Verify Alert Configuration: Check that the alert threshold is correctly set at 95% and targets the right ECS containers.

2.Confirm Metric Accuracy: Double-check the metric data in CloudWatch Metrics to ensure it accurately reflects memory usage.

3.Review ECS Setup: Check ECS container configurations and investigate for any memory-intensive tasks or issues.

4.Monitor Alarm State Changes: Look into CloudWatch Alarm History for patterns in alarm triggers.

5.Adjust Alert Actions: Review and adjust alert actions if necessary, ensuring they are appropriate.

go through with documents: - https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#common-features-of-alarms

profile picture
respondido há 23 dias
  • I double and triple-checked the whole setup and all looks correct. We have the same setup for a list of other ECS instances and we have this problem only for this ECS. It might have been temporary, because it hasn't happen since.

1
Resposta aceita

Hello.

What is the setting for CloudWatch Alarm's treat missing data?
Depending on the contents of this setting, an alarm may occur even if the metrics appear normal.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data

You may also be able to check the reason for the alarm by looking at the CloudWatch Alarm history.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#common-features-of-alarms

profile picture
ESPECIALISTA
respondido há 24 dias
  • It is set to treat missing data as missing, but we are evaluating percentiles with low samples. So I will take a look at that setting

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas