Hi There,
We're having some issues on a EC2 instance. Basically the instance is running a piece of software called "Good Sync" to backup various cloud file stores across different providers (365, DropBox, S3, etc.)
The software will run fine for about 24hr - 48hr but then the EC2 instance will go offline. It can only be restored by performing a "Force Stop" from the AWS console.
I've been through it with their support and checked the logs and their software itself doesn't seem to be crashing or freezing. The server doesn't seem to be crashing or freezing either but the networking subsystem / service is failing and never seems to recover. The journal shows the following when it goes offline -
Feb 07 05:58:49 gs.lennox-it.uk systemd-networkd[1993]: enX0: Failed
Feb 07 05:58:47 gs.lennox-it.uk systemd-networkd[1993]: enX0: Could not set DHCPv4 route: Connection timed out
Feb 07 05:50:36 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 05:50:35 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
I've looked into the error and from what I've read online it can occur when the server is under very heavy load, which I think is what is happening (though only for a short amount of time). The software doesn't crash but it does spike the CPU at 100% for a while.
The main problem is that the server never seems to recover the network again and remains offline until it is power cycled.
Has anyone seen anything like this before and/or know a solution? I've tried to install cpulimit to try and throttle the software but this doesn't seem to be available through yum.
Full logs from the error up until when it reboots are below
Any advice would be much appreciated
Olly
Feb 07 09:09:01 localhost kernel: Linux version 6.1.119-129.201.amzn2023.x86_64 (mockbuild@ip-10-0-49-203) (gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), GNU l>
-- Boot e2fb220bddd741f692bec78198ab28b3 --
Feb 07 09:04:30 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 09:04:30 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
Feb 07 09:04:16 gs.lennox-it.uk systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Feb 07 09:04:12 gs.lennox-it.uk systemd[1]: sysstat-collect.service: Deactivated successfully.
Feb 07 09:03:51 gs.lennox-it.uk systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Feb 07 09:03:46 gs.lennox-it.uk ec2net[36932]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 09:03:40 gs.lennox-it.uk setup-policy-routes[36895]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 09:03:06 gs.lennox-it.uk CROND[36909]: (root) CMDEND (run-parts /etc/cron.hourly)
Feb 07 09:03:03 gs.lennox-it.uk run-parts[36924]: (/etc/cron.hourly) finished 0anacron
Feb 07 09:02:18 gs.lennox-it.uk run-parts[36917]: (/etc/cron.hourly) starting 0anacron
Feb 07 09:01:05 gs.lennox-it.uk CROND[36910]: (root) CMD (run-parts /etc/cron.hourly)
Feb 07 08:57:39 gs.lennox-it.uk ec2net[36897]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:57:33 gs.lennox-it.uk setup-policy-routes[36871]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:51:50 gs.lennox-it.uk ec2net[36872]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:51:41 gs.lennox-it.uk setup-policy-routes[36856]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:51:30 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 08:51:30 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
Feb 07 08:51:14 gs.lennox-it.uk systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Feb 07 08:51:09 gs.lennox-it.uk systemd[1]: sysstat-collect.service: Deactivated successfully.
Feb 07 08:50:28 gs.lennox-it.uk systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Feb 07 08:49:17 gs.lennox-it.uk ec2net[36857]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:49:09 gs.lennox-it.uk setup-policy-routes[36836]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:44:40 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 08:44:40 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
Feb 07 08:44:33 gs.lennox-it.uk systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Feb 07 08:44:30 gs.lennox-it.uk systemd[1]: sysstat-collect.service: Deactivated successfully.
Feb 07 08:44:00 gs.lennox-it.uk systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Feb 07 08:43:48 gs.lennox-it.uk ec2net[36837]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:43:42 gs.lennox-it.uk setup-policy-routes[36816]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:39:05 gs.lennox-it.uk ec2net[36817]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:38:57 gs.lennox-it.uk setup-policy-routes[36796]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:35:05 gs.lennox-it.uk ec2net[36797]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:34:58 gs.lennox-it.uk setup-policy-routes[36774]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:31:27 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 08:31:26 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
Feb 07 08:31:16 gs.lennox-it.uk systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Feb 07 08:31:12 gs.lennox-it.uk systemd[1]: sysstat-collect.service: Deactivated successfully.
Feb 07 08:30:21 gs.lennox-it.uk systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Feb 07 08:29:14 gs.lennox-it.uk ec2net[36775]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:29:07 gs.lennox-it.uk setup-policy-routes[36757]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:24:27 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 08:24:27 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
Feb 07 08:24:20 gs.lennox-it.uk systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Feb 07 08:24:17 gs.lennox-it.uk systemd[1]: sysstat-collect.service: Deactivated successfully.
Feb 07 08:24:17 gs.lennox-it.uk ec2net[36758]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:24:12 gs.lennox-it.uk setup-policy-routes[36732]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:23:35 gs.lennox-it.uk systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Feb 07 08:22:59 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 08:22:59 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
Feb 07 08:22:51 gs.lennox-it.uk systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Feb 07 08:22:48 gs.lennox-it.uk systemd[1]: sysstat-collect.service: Deactivated successfully.
Feb 07 08:22:18 gs.lennox-it.uk systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Feb 07 08:18:43 gs.lennox-it.uk ec2net[36733]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:18:35 gs.lennox-it.uk setup-policy-routes[36705]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:07:31 gs.lennox-it.uk ec2net[36706]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 08:07:19 gs.lennox-it.uk setup-policy-routes[36657]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 08:04:21 gs.lennox-it.uk CROND[36676]: (root) CMDEND (run-parts /etc/cron.hourly)
Feb 07 08:04:17 gs.lennox-it.uk run-parts[36694]: (/etc/cron.hourly) finished 0anacron
Feb 07 08:03:02 gs.lennox-it.uk run-parts[36685]: (/etc/cron.hourly) starting 0anacron
Feb 07 08:01:05 gs.lennox-it.uk CROND[36677]: (root) CMD (run-parts /etc/cron.hourly)
Feb 07 07:50:37 gs.lennox-it.uk ec2net[36658]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 07:50:29 gs.lennox-it.uk setup-policy-routes[36625]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable
Feb 07 07:36:35 gs.lennox-it.uk ec2net[36626]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 07:03:40 gs.lennox-it.uk CROND[36558]: (root) CMDEND (run-parts /etc/cron.hourly)
Feb 07 07:03:37 gs.lennox-it.uk run-parts[36572]: (/etc/cron.hourly) finished 0anacron
Feb 07 07:02:08 gs.lennox-it.uk run-parts[36565]: (/etc/cron.hourly) starting 0anacron
Feb 07 07:01:05 gs.lennox-it.uk CROND[36559]: (root) CMD (run-parts /etc/cron.hourly)
Feb 07 06:15:08 gs.lennox-it.uk CROND[36482]: (root) CMDEND (run-parts /etc/cron.hourly)
Feb 07 06:15:00 gs.lennox-it.uk run-parts[36499]: (/etc/cron.hourly) finished 0anacron
Feb 07 06:10:46 gs.lennox-it.uk run-parts[36490]: (/etc/cron.hourly) starting 0anacron
Feb 07 06:01:09 gs.lennox-it.uk CROND[36483]: (root) CMD (run-parts /etc/cron.hourly)
Feb 07 05:58:49 gs.lennox-it.uk systemd-networkd[1993]: enX0: Failed
Feb 07 05:58:47 gs.lennox-it.uk systemd-networkd[1993]: enX0: Could not set DHCPv4 route: Connection timed out
Feb 07 05:50:36 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 05:50:35 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
Feb 07 05:50:32 gs.lennox-it.uk systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Feb 07 05:50:30 gs.lennox-it.uk systemd[1]: sysstat-collect.service: Deactivated successfully.
Feb 07 05:50:13 gs.lennox-it.uk systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Feb 07 05:50:02 gs.lennox-it.uk ec2net[36462]: Got IMDSv2 token from http://169.254.169.254/latest
Feb 07 05:48:01 gs.lennox-it.uk start-amazon-cloudwatch-agent[2008]: request expired, resigning
Feb 07 05:47:39 gs.lennox-it.uk ec2net[36455]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 05:46:09 gs.lennox-it.uk start-amazon-cloudwatch-agent[2008]: request expired, resigning
Feb 07 05:41:12 gs.lennox-it.uk start-amazon-cloudwatch-agent[2008]: request expired, resigning
Feb 07 05:40:52 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 05:40:52 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
Feb 07 05:40:49 gs.lennox-it.uk systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Feb 07 05:40:49 gs.lennox-it.uk systemd[1]: sysstat-collect.service: Deactivated successfully.
Feb 07 05:40:39 gs.lennox-it.uk systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Feb 07 05:38:21 gs.lennox-it.uk amazon-ssm-agent[2173]: 2025-02-07 05:38:10.5509 ERROR EC2RoleProvider Failed to connect to Systems Manager with SSM role crede>
Feb 07 05:38:10 gs.lennox-it.uk amazon-ssm-agent[2173]: caused by: Put "http://169.254.169.254/latest/api/token": context deadline exceeded (Client.Timeout exc>
Feb 07 05:38:10 gs.lennox-it.uk amazon-ssm-agent[2173]: caused by: RequestError: send request failed
Feb 07 05:38:10 gs.lennox-it.uk amazon-ssm-agent[2173]: status code: 0, request id:
Feb 07 05:38:10 gs.lennox-it.uk amazon-ssm-agent[2173]: caused by: :
Feb 07 05:38:10 gs.lennox-it.uk amazon-ssm-agent[2173]: 2025-02-07 05:37:57.6154 ERROR [TokenRequestService] failed to retrieve instance identity role. Error: >
Feb 07 05:37:21 gs.lennox-it.uk amazon-ssm-agent[2173]: 2025-02-07 05:37:19.1422 WARN EC2RoleProvider Failed to connect to Systems Manager with instance profil>
Feb 07 05:31:43 gs.lennox-it.uk start-amazon-cloudwatch-agent[2008]: request expired, resigning
Feb 07 05:31:14 gs.lennox-it.uk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect co>
Feb 07 05:31:14 gs.lennox-it.uk audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect c>
Feb 07 05:31:11 gs.lennox-it.uk systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Feb 07 05:31:11 gs.lennox-it.uk systemd[1]: sysstat-collect.service: Deactivated successfully.
Feb 07 05:30:56 gs.lennox-it.uk systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Feb 07 05:30:19 gs.lennox-it.uk ec2net[36392]: Got IMDSv2 token from http://169.254.169.254/latest
Feb 07 05:27:15 gs.lennox-it.uk ec2net[36383]: [get_meta] Querying IMDS for network/interfaces/macs/(mac)/device-number
Feb 07 05:27:11 gs.lennox-it.uk setup-policy-routes[36366]: /usr/share/amazon-ec2-net-utils/lib.sh: line 89: imds_endpoint: unbound variable