- Newest
- Most votes
- Most comments
Check your current memory usage through free -h then alsio check your out of memory kills through this command: sudo journalctl -k | grep -i oom Check whether your docker containers caused the issue using: docker stats and also check if container is restarting again and again through docker ps -a. Inspect your docker container logs Check CPU usage and system load through htop Check disk space: df -h this can also be the possible issue of your memory exhaustion Remove your unused images through: docker system prune -a
Allocate a SWAP as it prevents sudden crashes. Steps for SWAP File: sudo fallocate -l 2G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile Persist Swap after reboot: echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
To troubleshoot your EC2 instance's unresponsiveness and potential memory issues, here's what you should investigate:
Checking for Memory Exhaustion:
Start by examining your instance's console output after restarting it. Look for Out of Memory (OOM) messages or kernel panic errors, which are clear indicators of RAM exhaustion. These system-level errors can cause the operating system to initiate a shutdown or become unresponsive.
Identifying High Resource Usage:
Check your Amazon CloudWatch metrics for CPU and memory utilization patterns leading up to the incident. If you have CloudWatch alarms configured, review whether any were triggered. Also examine your EventBridge rules and AWS Systems Manager automation to determine if any automated actions were taken in response to high resource usage.
Reviewing Logs and Events:
Use AWS CloudTrail to review your instance's event history around the time of the incident. This will show you the exact timing of any state changes and what triggered them. The StateReason code from the describe-instances CLI command can provide additional context about why the instance became unresponsive.
Prevention Strategies:
For Docker-specific monitoring, you should implement CloudWatch monitoring for container-level metrics. Set up CloudWatch alarms to alert you before resources become critically low, rather than after the instance becomes unresponsive.
Consider adding swap memory as a temporary buffer, though this isn't a long-term solution for undersized instances. If your workload consistently approaches memory limits, upgrading to a larger instance type with more RAM would be more appropriate.
To prevent automated stopping of critical instances, you can tag them appropriately and configure your automation scripts to exclude instances with those tags. Additionally, set up Amazon SNS notifications with EventBridge rules to alert you immediately when resource thresholds are exceeded, giving you time to intervene before the instance becomes unresponsive.
Sources
Understand why EC2 instance stopped | AWS re:Post
Relevant content
- asked 5 years ago
- asked 4 years ago
- AWS OFFICIALUpdated 4 months ago

Thanks for sharing, it was quite helpful