EC2 instance becomes unreachable after some days

0

Hi All, One of my EC2 instance becomes unreachable after almost few days. During this time the CPU Utilization does increases to around 50-80% (But I still feel that is should not lead to instance becoming unreachable). Can someone let me know on how to debug this and to know which process might be causing this issue. Also, is increased CPU utilization the reason for instance becoming unreachable?

PS : I am using On Demand Linux t2.xlarge Instance and have enough root storage. My ask:

  • How can i know which process is the culprit?
  • Will system logs help me here and if yes how can i access them (now or previous logs for 5-7 days)?
  • There is as such no clear pattern but mostly this happens every 5-7 days. Is AWS doing something on behind that could be the reason
Prakhar
已提問 1 年前檢視次數 511 次
5 個答案
0

How can i know which process is the culprit?

To check the process load, it may be useful to run commands such as "ps aux" or install CloudWatch Agent to monitor the process.
In addition to the CPU, it is also possible that memory and other factors could be causing the inaccessibility.

Will system logs help me here and if yes how can i access them (now or previous logs for 5-7 days)?

This will vary depending on the operating system, but the system log should be stored in the "/var/log/" directory.
For Amazon Linux2, the file "/var/log/messages" is the one that outputs the system log.

There is as such no clear pattern but mostly this happens every 5-7 days. Is AWS doing something on behind that could be the reason

In the case of scheduled maintenance, the AWS side should contact you, so I don't think it is relevant in this case.

profile picture
專家
已回答 1 年前
  • If SSH or other means is available after rebooting, it would be better to SSH and check the system logs. Also, adding more custom metrics in CloudWatch Agent is costly, but I think it is a necessary cost to track down the cause of the problem.
    Also, since you are using t2.xlarge, please check the "CPUCreditUsage" for the time period when the instance became unreachable. If this metric is zero, it is likely that only the baseline throughput can perform and that is why it is unreachable. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-monitoring-cpu-credits.html

    In some cases, it is recommended to change the instance type to the m6 system.

0
profile picture
已回答 1 年前
0

@Riku_Kobayashi ps aux will not help since when instance starts having unreachable state then I have already lost the control to connect/ssh to instance. hence, will not be able to run this command anywhere. Is there a way around? With cloudwatch agent yes I can try, its just cost is associated with same.

Prakhar
已回答 1 年前
0

What kind of application are you running on this? The best place to debug is your application or application server logs. It can be the requests you are getting or a performance issue in your application. Start with debugging in the deployed apps.

已回答 1 年前
0

Where are you seeing the 50-80% figure for CPU usage, is that in AWS Console? Does it go higher than that when the host becomes unresponsive, or does it just stop recording a figure?

It would to setup CloudWatch agent to collect more detailed system logs https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html - this may show that your root cause is exhaustion of some system resources.

What software are you running on the t2.xlarge ?

profile picture
專家
Steve_M
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南