EC2 instance sometimes stuck during shutdown after running CLI "aws ec2 stop-instances" on it

0

I have an instance that needs to periodically check for jobs and then shutdown after job is finished

The shutdown is performed from inside a docker container running on that instance. The container has aws-cli installed and API keys set up. From inside that container I run: aws ec2 stop-instances --instance-ids $INSTANCE_ID

and as a result the instance sucessfully starts the shutdown process and usually stops without issues. However sometimes it gets stuck after the system is halted, but the instance itself is still in running state. However as all services are stopped there is no access to it.

The system is default ubuntu AMI: #18~22.04.1-Ubuntu Instance type: t3.2xlarge

Do you have any idea why this happens and how to prevent it?

This is the tail of syslog just before it freezes:

Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: unattended-upgrades.service: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Removed slice Slice /system/modprobe.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped target Cloud-init target.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped target Graphical Interface.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped target Host and Network Name Lookups.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped target Timer Units.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: apt-daily-upgrade.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Daily apt upgrade and clean activities.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: apt-daily.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Daily apt download activities.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: dpkg-db-backup.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Daily dpkg database backup timer.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: e2scrub_all.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Periodic ext4 Online Metadata Check for All Filesystems.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: fstrim.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Discard unused blocks once a week.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: logrotate.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Daily rotation of log files.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: man-db.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Daily man-db regeneration.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: motd-news.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Message of the Day.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: systemd-tmpfiles-clean.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: update-notifier-download.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Download data for packages that failed at package install time.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: update-notifier-motd.timer: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped Check to see whether there is a new version of Ubuntu available.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopped target System Time Synchronized.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: lvm2-lvmpolld.socket: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Closed LVM2 poll daemon socket.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: systemd-rfkill.socket: Deactivated successfully.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopping ACPI event daemon...
Mar 13 15:57:15 ip-172-31-26-16 systemd[1]: Stopping Availability of block devices...

Tomasz
asked 2 months ago208 views
1 Answer
0

This answer wont help your fix, but it sounds like this process would be better placed as an ECS Task.

profile picture
EXPERT
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions