Need help to start g4dn.8xlarge EC2

0

Dear all I can not start my Ubuntu 20.4 g4dn.8xlarge EC2 because it's status is failed. The Ami I used is ami-088da9557aae42f39 It's message is "Instance reachability check failed" When I check system log, I see my VM can not start mnvidia-powerd service. Anyone know how to fix this? Thanks

TrungPN
已提问 2 年前287 查看次数
1 回答
0

Hi there

An instance reachability check indicates that something has gone wrong at the OS layer of your instance that is stopping it from responding to ARP probes designed to monitor reachability. It sounds like your instance is failing to boot because it cannot start a particular service. I suggested that this kind of thing could be caused by an absence of NVIDIA modules. You could try to fix this by launching a rescue instance[1] in the same AZ as your instance, stopping your instance, detaching its root volume and reattaching it to the rescue instance as /dev/svda1. Then, connect to the instance, mount the volume:

$ mount -o nouuid /dev/xvdf1 /mnt

Mount /dev, /run, /proc, and /sys of the rescue instance to the same paths as the newly mounted volume:

$ for m in dev proc run sys; do mount -o bind {,/mnt}/$m; done

Call the chroot function to change into the mount directory:

$ chroot /mnt

Then update your kernel headers and drivers[2]. Once you are done, detach the volume and reattach it back to the original instance. Hopefully, this will have repaired the issue and the instance will boot successfully.

Resources:

How to build rescue instance:

[1]https:https://aws.amazon.com/premiumsupport/knowledge-center/ec2-instance-boot-issues/

[2]https://forums.developer.nvidia.com/t/black-login-screen-when-nvidia-v470-103-01-drivers-are-installed-on-fedora-36-fresh-install/215643

已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则