Need help to start g4dn.8xlarge EC2

0

Dear all I can not start my Ubuntu 20.4 g4dn.8xlarge EC2 because it's status is failed. The Ami I used is ami-088da9557aae42f39 It's message is "Instance reachability check failed" When I check system log, I see my VM can not start mnvidia-powerd service. Anyone know how to fix this? Thanks

TrungPN
已提問 2 年前檢視次數 287 次
1 個回答
0

Hi there

An instance reachability check indicates that something has gone wrong at the OS layer of your instance that is stopping it from responding to ARP probes designed to monitor reachability. It sounds like your instance is failing to boot because it cannot start a particular service. I suggested that this kind of thing could be caused by an absence of NVIDIA modules. You could try to fix this by launching a rescue instance[1] in the same AZ as your instance, stopping your instance, detaching its root volume and reattaching it to the rescue instance as /dev/svda1. Then, connect to the instance, mount the volume:

$ mount -o nouuid /dev/xvdf1 /mnt

Mount /dev, /run, /proc, and /sys of the rescue instance to the same paths as the newly mounted volume:

$ for m in dev proc run sys; do mount -o bind {,/mnt}/$m; done

Call the chroot function to change into the mount directory:

$ chroot /mnt

Then update your kernel headers and drivers[2]. Once you are done, detach the volume and reattach it back to the original instance. Hopefully, this will have repaired the issue and the instance will boot successfully.

Resources:

How to build rescue instance:

[1]https:https://aws.amazon.com/premiumsupport/knowledge-center/ec2-instance-boot-issues/

[2]https://forums.developer.nvidia.com/t/black-login-screen-when-nvidia-v470-103-01-drivers-are-installed-on-fedora-36-fresh-install/215643

已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南