Skip to content

EC2 instance became unresponsive — possible RAM exhaustion or Docker resource issue

0

Hello,

Yesterday my AWS EC2 instance suddenly became unstable and difficult to access. I am hosting my application using Docker containers (backend services) on the EC2 machine.

What happened:

  • SSH access became unstable / slow
  • Applications stopped responding properly
  • Deployment workflows were failing
  • I suspect the server may have run out of RAM or system resources, but I am not sure

I want to understand the actual root cause and how to prevent this from happening again.

Could someone help me understand:

  1. How to check if the EC2 instance ran out of memory (RAM)
  2. How to verify if Docker containers caused high memory or CPU usage
  3. Which AWS logs/metrics should I check (CloudWatch, system logs, etc.)
  4. Whether swap memory should be added
  5. Whether upgrading the instance type would help
  6. Best practices for monitoring Docker resource usage on EC2

Environment:

  • EC2 instance hosting Docker containers
  • Applications deployed through GitHub Actions
  • Linux-based server
  • Issue happened suddenly yesterday

I would appreciate guidance on troubleshooting and preventing future downtime.

asked 12 days ago27 views
2 Answers
0
Accepted Answer

Check your current memory usage through free -h then alsio check your out of memory kills through this command: sudo journalctl -k | grep -i oom Check whether your docker containers caused the issue using: docker stats and also check if container is restarting again and again through docker ps -a. Inspect your docker container logs Check CPU usage and system load through htop Check disk space: df -h this can also be the possible issue of your memory exhaustion Remove your unused images through: docker system prune -a

Allocate a SWAP as it prevents sudden crashes. Steps for SWAP File: sudo fallocate -l 2G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile Persist Swap after reboot: echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

answered 12 days ago
EXPERT
reviewed 12 days ago
  • Thanks for sharing, it was quite helpful

0

To troubleshoot your EC2 instance's unresponsiveness and potential memory issues, here's what you should investigate:

Checking for Memory Exhaustion:

Start by examining your instance's console output after restarting it. Look for Out of Memory (OOM) messages or kernel panic errors, which are clear indicators of RAM exhaustion. These system-level errors can cause the operating system to initiate a shutdown or become unresponsive.

Identifying High Resource Usage:

Check your Amazon CloudWatch metrics for CPU and memory utilization patterns leading up to the incident. If you have CloudWatch alarms configured, review whether any were triggered. Also examine your EventBridge rules and AWS Systems Manager automation to determine if any automated actions were taken in response to high resource usage.

Reviewing Logs and Events:

Use AWS CloudTrail to review your instance's event history around the time of the incident. This will show you the exact timing of any state changes and what triggered them. The StateReason code from the describe-instances CLI command can provide additional context about why the instance became unresponsive.

Prevention Strategies:

For Docker-specific monitoring, you should implement CloudWatch monitoring for container-level metrics. Set up CloudWatch alarms to alert you before resources become critically low, rather than after the instance becomes unresponsive.

Consider adding swap memory as a temporary buffer, though this isn't a long-term solution for undersized instances. If your workload consistently approaches memory limits, upgrading to a larger instance type with more RAM would be more appropriate.

To prevent automated stopping of critical instances, you can tag them appropriately and configure your automation scripts to exclude instances with those tags. Additionally, set up Amazon SNS notifications with EventBridge rules to alert you immediately when resource thresholds are exceeded, giving you time to intervene before the instance becomes unresponsive.
Sources
Understand why EC2 instance stopped | AWS re:Post

answered 12 days ago
EXPERT
reviewed 12 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.