Trying to build out alarm automation and running into snag on storage alarms.

0

I have the Cloud Watch Agent installed on an ec2 instance for testing. Here is my config:

{
    "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "cwagent"
    },
    "metrics": {
        "append_dimensions": {
            "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
            "ImageId": "${aws:ImageId}",
            "InstanceId": "${aws:InstanceId}",
            "InstanceType": "${aws:InstanceType}"
        },
        "aggregation_dimensions" : [["InstanceId","path"]],
        "metrics_collected": {
            "disk": {
                "measurement": [
                    "used_percent"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ],
                "ignore_file_system_types": [
                    "sysfs", "devtmpfs", "tmpfs", "overlay", "debugfs", "squashfs", "iso9660", "proc", "autofs", "tracefs"
                ],
                "drop_device": true
            },
            "mem": {
                "measurement": [
                    "mem_used_percent"
                ],
                "metrics_collection_interval": 60
            }
        }
    }
}

I'm trying to get the cloudwatch agent to send disk stats (which it is) but when I try to create an alarm with my lambda function, the created alarms don't receive any disk stats.

My function project is in type script and the create storage alarm function has two dependency functions—one to filter through tags to set alarm property values and another to create or update existing alarms. I won't include the tag filtering functions because that function works as expected and does not create the alarm. I will however provide the storage alarm function (manageStorageAlarmForInstance) and the function that takes passed parameters from the storage alarm function and actually creates those alarms.

//function to create or update alarms: 
async function createOrUpdateAlarm(
  alarmName: string,
  instanceId: string,
  props: AlarmProps
) {
  try {
    await cloudWatchClient.send(
      new PutMetricAlarmCommand({
        AlarmName: alarmName,
        ComparisonOperator: 'GreaterThanThreshold',
        EvaluationPeriods: props.evaluationPeriods,
        MetricName: props.metricName,
        Namespace: props.namespace,
        Period: props.period,
        Statistic: 'Average',
        Threshold: props.threshold,
        ActionsEnabled: false,
        Dimensions: props.dimensions,
      })
    );
    log
      .info()
      .str('alarmName', alarmName)
      .str('instanceId', instanceId)
      .num('threshold', props.threshold)
      .num('period', props.period)
      .num('evaluationPeriods', props.evaluationPeriods)
      .msg('Alarm configured');
  } catch (e) {
    log
      .error()
      .err(e)
      .str('alarmName', alarmName)
      .str('instanceId', instanceId)
      .msg('Failed to create or update alarm due to an error');
  }
} 

//function to create storage monitoring alarms: 
async function manageStorageAlarmForInstance(
  instanceId: string,
  instanceType: string,
  imageId: string,
  tags: Tag,
  type: AlarmClassification
): Promise<void> {
  const baseAlarmName = `autoAlarm-EC2-${instanceId}-${type}StorageUtilization`;
  const thresholdKey = `autoalarm:storage-free-percent-${type.toLowerCase()}`;
  const durationTimeKey = 'autoalarm:storage-percent-duration-time';
  const durationPeriodsKey = 'autoalarm:storage-percent-duration-periods';
  const defaultThreshold = type === 'Critical' ? 10 : 20;

  const alarmProps: AlarmProps = {
    threshold: defaultThreshold,
    period: 60,
    namespace: 'disk',
    evaluationPeriods: 5,
    metricName: 'used_percent',
    dimensions: [
      {Name: 'InstanceId', Value: instanceId},
      {Name: 'ImageId', Value: imageId},
      {Name: 'InstanceType', Value: instanceType},
      {Name: 'Path', Value: '/'},
    ],
  };

  try {
    configureAlarmPropsFromTags(
      alarmProps,
      tags,
      thresholdKey,
      durationTimeKey,
      durationPeriodsKey
    );
  } catch (e) {
    log.error().err(e).msg('Error configuring alarm props from tags');
    throw new Error('Error configuring alarm props from tags');
  }
//checks to see if alarm exists
  const alarmExists = await doesAlarmExist(baseAlarmName);
  if (
    !alarmExists ||
    (alarmExists && (await needsUpdate(baseAlarmName, alarmProps))) //needsUpdate just compares the alarm props against the current alarm values 
  ) {
    await createOrUpdateAlarm(baseAlarmName, instanceId, alarmProps);
    log
      .info()
      .str('alarmName', baseAlarmName)
      .str('instanceId', instanceId)
      .msg('Storage usage alarm configured or updated.');
  } else {
    log
      .info()
      .str('alarmName', baseAlarmName)
      .str('instanceId', instanceId)
      .msg('Storage usage alarm is already up-to-date');
  }
}

Any ideas on what's wrong with the way I'm creating my storage alarms? Why wont those alarms receive data from the cloud watch agent?

Thanks in advance for the assist.

질문됨 한 달 전152회 조회
2개 답변
0

Hello.

I think it is necessary to first check whether metrics are being acquired, not Alarm.
Are the target CloudWatch metrics being output?
If the metrics exist, there may be a problem with your Lambda code.

profile picture
전문가
답변함 한 달 전
profile picture
전문가
검토됨 한 달 전
  • You are correct, the metrics do exist. If I go into cloudwatch and look at creating an alarm from scratch, I can see all the mount points individually for the the ec2 instance with the cloudwatch agent installed reporting storage stats for each mount point respectively. Just trying to figure out how to configure my function in my lamba correctly to do the same.

0

Have you looked at this blog...solution ready made....just read instructions very carefully! https://aws.amazon.com/blogs/mt/use-tags-to-create-and-maintain-amazon-cloudwatch-alarms-for-amazon-ec2-instances-part-1/

njoylif
답변함 22일 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠