Need help to start g4dn.8xlarge EC2

0

Dear all I can not start my Ubuntu 20.4 g4dn.8xlarge EC2 because it's status is failed. The Ami I used is ami-088da9557aae42f39 It's message is "Instance reachability check failed" When I check system log, I see my VM can not start mnvidia-powerd service. Anyone know how to fix this? Thanks

TrungPN
질문됨 2년 전287회 조회
1개 답변
0

Hi there

An instance reachability check indicates that something has gone wrong at the OS layer of your instance that is stopping it from responding to ARP probes designed to monitor reachability. It sounds like your instance is failing to boot because it cannot start a particular service. I suggested that this kind of thing could be caused by an absence of NVIDIA modules. You could try to fix this by launching a rescue instance[1] in the same AZ as your instance, stopping your instance, detaching its root volume and reattaching it to the rescue instance as /dev/svda1. Then, connect to the instance, mount the volume:

$ mount -o nouuid /dev/xvdf1 /mnt

Mount /dev, /run, /proc, and /sys of the rescue instance to the same paths as the newly mounted volume:

$ for m in dev proc run sys; do mount -o bind {,/mnt}/$m; done

Call the chroot function to change into the mount directory:

$ chroot /mnt

Then update your kernel headers and drivers[2]. Once you are done, detach the volume and reattach it back to the original instance. Hopefully, this will have repaired the issue and the instance will boot successfully.

Resources:

How to build rescue instance:

[1]https:https://aws.amazon.com/premiumsupport/knowledge-center/ec2-instance-boot-issues/

[2]https://forums.developer.nvidia.com/t/black-login-screen-when-nvidia-v470-103-01-drivers-are-installed-on-fedora-36-fresh-install/215643

답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠