EC2 instance becomes unreachable after some days

0

Hi All, One of my EC2 instance becomes unreachable after almost few days. During this time the CPU Utilization does increases to around 50-80% (But I still feel that is should not lead to instance becoming unreachable). Can someone let me know on how to debug this and to know which process might be causing this issue. Also, is increased CPU utilization the reason for instance becoming unreachable?

PS : I am using On Demand Linux t2.xlarge Instance and have enough root storage. My ask:

  • How can i know which process is the culprit?
  • Will system logs help me here and if yes how can i access them (now or previous logs for 5-7 days)?
  • There is as such no clear pattern but mostly this happens every 5-7 days. Is AWS doing something on behind that could be the reason
Prakhar
질문됨 일 년 전511회 조회
5개 답변
0

How can i know which process is the culprit?

To check the process load, it may be useful to run commands such as "ps aux" or install CloudWatch Agent to monitor the process.
In addition to the CPU, it is also possible that memory and other factors could be causing the inaccessibility.

Will system logs help me here and if yes how can i access them (now or previous logs for 5-7 days)?

This will vary depending on the operating system, but the system log should be stored in the "/var/log/" directory.
For Amazon Linux2, the file "/var/log/messages" is the one that outputs the system log.

There is as such no clear pattern but mostly this happens every 5-7 days. Is AWS doing something on behind that could be the reason

In the case of scheduled maintenance, the AWS side should contact you, so I don't think it is relevant in this case.

profile picture
전문가
답변함 일 년 전
  • If SSH or other means is available after rebooting, it would be better to SSH and check the system logs. Also, adding more custom metrics in CloudWatch Agent is costly, but I think it is a necessary cost to track down the cause of the problem.
    Also, since you are using t2.xlarge, please check the "CPUCreditUsage" for the time period when the instance became unreachable. If this metric is zero, it is likely that only the baseline throughput can perform and that is why it is unreachable. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-monitoring-cpu-credits.html

    In some cases, it is recommended to change the instance type to the m6 system.

0
profile picture
답변함 일 년 전
0

@Riku_Kobayashi ps aux will not help since when instance starts having unreachable state then I have already lost the control to connect/ssh to instance. hence, will not be able to run this command anywhere. Is there a way around? With cloudwatch agent yes I can try, its just cost is associated with same.

Prakhar
답변함 일 년 전
0

What kind of application are you running on this? The best place to debug is your application or application server logs. It can be the requests you are getting or a performance issue in your application. Start with debugging in the deployed apps.

답변함 일 년 전
0

Where are you seeing the 50-80% figure for CPU usage, is that in AWS Console? Does it go higher than that when the host becomes unresponsive, or does it just stop recording a figure?

It would to setup CloudWatch agent to collect more detailed system logs https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html - this may show that your root cause is exhaustion of some system resources.

What software are you running on the t2.xlarge ?

profile picture
전문가
Steve_M
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠