为什么统一 CloudWatch 代理未将我的指标或日志事件推送到 CloudWatch?

4 分钟阅读
0

我在我的 Amazon Elastic Compute Cloud(Amazon EC2)实例上配置了统一 Amazon CloudWatch 代理,以将指标和日志发布到 Amazon CloudWatch。但我在 CloudWatch 控制台中看不到我的指标或日志。我想要在 CloudWatch 中查看我的指标和日志事件

简短描述

统一 CloudWatch 代理未将您的指标或日志推送到 CloudWatch 的原因可能有多种。例如,您可能遇到了权限或连接错误,导致代理无法发布指标。在查看统一 CloudWatch 代理日志时,可能会出现其中一种错误:

  • 代理日志错误: 无法连接到端点
  • 代理日志错误: 权限不足

解决方法

**注意:**如果在运行 AWS 命令行界面(AWS CLI)命令时收到错误,请参阅 Troubleshoot AWS CLI errors。此外,请确保使用的是最新版本的 AWS CLI

查看统一 CloudWatch 代理日志

可使用代理日志文件来帮助解决在使用统一 CloudWatch 代理包时遇到的问题。

可能会遇到以下其中一种问题:

可能会在以下日志中看到其中一个错误。

代理日志错误: 无法连接到端点

2021-08-30T04:07:46Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.us-east-1.amazonaws.com/": dial tcp 172.31.11.121:443: i/o timeout
2021-08-30T04:07:46Z W! 210 retries, going to sleep 1m0s before retrying.
2021-08-30T04:07:46Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.us-east-1.amazonaws.com/": dial tcp 172.31.11.121:443: i/o timeout
2021-08-30T04:07:46Z W! 211 retries, going to sleep 1m0s before retrying.

代理日志错误: 权限不足

2021-08-30T02:15:45Z E! cloudwatch: code: AccessDenied, message: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: cloudwatch:PutMetricData, original error:
2021-08-30T02:15:45Z W! 1 retries, going to sleep 400ms before retrying.
2021-08-30T02:15:46Z E! WriteToCloudWatch failure, err:  AccessDenied: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: cloudwatch:PutMetricData
    status code: 403, request id: f1171fd0-05b6-4f7d-bac2-629c8594c46e

确认与 CloudWatch 端点的连接

如果流向 CloudWatch 的流量不经过公共互联网,则可以使用 Amazon VPC 端点。如果使用 Amazon VPC 端点,请检查下面的参数:

  • 如果使用私有名称服务器,请确认 DNS 解析提供了准确的响应。
  • 确认 CloudWatch 端点解析为私有 IP 地址。
  • 确认与允许来自主机的入站流量的 Amazon VPC 端点关联的安全组。

要确认与 CloudWatch 端点的连接,请完成下面的步骤:

  1. 要检查与指标端点的连接,请运行下面的命令:

    $ telnet monitoring.us-east-1.amazonaws.com 443
    Trying 52.46.138.115...
    Connected to monitoring.amazonaws.com.
    Escape character is '^]'.
    ^]
    telnet> quit
    Connection closed.
  2. 要检查与日志端点的连接,请运行下面的命令:

    $ telnet logs.us-east-1.amazonaws.com 443
    Trying 3.236.94.218...
    Connected to logs.us-east-1.amazonaws.com.
    Escape character is '^]'.
    ^]
    telnet> quit
    Connection closed
  3. 要检查 Amazon VPC 端点是否解析为私有 IP 地址,请运行下面的命令:

    $ dig monitoring.us-east-1.amazonaws.com
    +short172.31.11.121
    172.31.0.13

查看统一 CloudWatch 代理配置

代理配置文件详细说明了发布至 CloudWatch 的指标和日志。请查看代理配置文件,确认包含要发布的日志和指标。

确认主机有权发布指标和日志

AWS 托管策略 CloudWatchAgentServerPolicyCloudWatchAgentAdminPolicy 可帮助部署统一 CloudWatch 代理。这些策略还可帮助检查您是否拥有正确的权限。请使用这些策略作为参考,确保您的主机拥有正确的权限。

这些示例中的 AWS CLI 输出显示权限不足。

以下 AWS CLI config 命令显示缺少连接到 EC2 实例的 AWS Identity and Access Management(IAM)角色:

$ /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:CWT-Web-Server -s
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source ssm:CWT-Web-Server --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Region: us-east-1
credsConfig: map[]
Error in retrieving parameter store content: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Fail to fetch/remove json config: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors

Fail to fetch the config!

以下 AWS CLI config 命令显示错误的 IAM 角色已连接到 EC2 实例:

$ /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:CWT-Web-Server -s
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source ssm:CWT-Web-Server --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Region: us-east-1
credsConfig: map[]
Error in retrieving parameter store content: AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: ssm:GetParameter on resource: arn:aws:ssm:us-east-1:123456789012:parameter/CWT-Web-Server
    status code: 400, request id: b85b0a7a-0fb1-47b4-924f-be8cf43a3b4d
Fail to fetch/remove json config: AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: ssm:GetParameter on resource: arn:aws:ssm:us-east-1:123456789012:parameter/CWT-Web-Server
    status code: 400, request id: b85b0a7a-0fb1-47b4-924f-be8cf43a3b4d

Fail to fetch the config!

以下 get-caller-identity 命令会返回与实例关联的 IAM 用户或角色:

$ aws sts get-caller-identity
{
    "UserId": "AROA123456789012ABCDE:i-0744de7c842d2c2ba",
    "Account": "123456789012",
    "Arn": "arn:aws:sts::123456789012:assumed-role/CloudWatchAgentServerRole/i-0744de7c842d2c2ba"
}

确认代理正确启动

可以使用 AWS CLI 并将配置文件作为参数传递来启动代理。要启动代理,请运行下面有效的启动命令

对于 Linux,运行下面的命令:

- `$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:configuration-file-path`
- `$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:configuration-parameter-store-name`

对于 Windows,运行下面的命令:

- `& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c file:"C:\Program Files\Amazon\AmazonCloudWatchAgent\config.json"`
- `& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c ssm:configuration-parameter-store-name`

重要信息: 请勿从 Windows 控制面板启动代理。

确认代理运行

要发布指标和日志,代理必须处于活动状态。要确认代理处于活动状态,请运行下面的命令:

$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
    "status": "running",
    "starttime": "2021-08-30T02:13:44+00:00",
    "configstatus": "configured",
    "cwoc_status": "stopped",
    "cwoc_starttime": "",
    "cwoc_configstatus": "not configured",
    "version": "1.247349.0b251399"
}

在更新代理配置后重启代理

代理不会自动注册对配置文件的更改。如果代理配置已更新,包括新的或不同的指标和日志,则必须使用下面的命令重启代理:

$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a stop
****** processing cwagent-otel-collector ******
cwagent-otel-collector has already been stopped

****** processing amazon-cloudwatch-agent ******
Redirecting to /bin/systemctl stop amazon-cloudwatch-agent.service


$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:config.json
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source file:config.json --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp
Start configuration validation...
/opt/aws/amazon-cloudwatch-agent/bin/config-translator --input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json --input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
2021/08/31 02:45:37 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp ...
Valid Json input schema.
I! Detecting run_as_user...
Configuration validation first phase succeeded
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
amazon-cloudwatch-agent has already been stopped
Redirecting to /bin/systemctl restart amazon-cloudwatch-agent.service

$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
  "status": "running",
  "starttime": "2021-08-31T02:45:37+0000",
  "configstatus": "configured",
  "cwoc_status": "stopped",
  "cwoc_starttime": "",
  "cwoc_configstatus": "not configured",
  "version": "1.247349.0b251399"
}

相关信息

如何安装和配置统一的 CloudWatch 代理,以便将指标和日志从 EC2 实例推送到 CloudWatch?

AWS 官方
AWS 官方已更新 3 个月前