Need help to start g4dn.8xlarge EC2

0

Dear all I can not start my Ubuntu 20.4 g4dn.8xlarge EC2 because it's status is failed. The Ami I used is ami-088da9557aae42f39 It's message is "Instance reachability check failed" When I check system log, I see my VM can not start mnvidia-powerd service. Anyone know how to fix this? Thanks

TrungPN
asked 2 years ago274 views
1 Answer
0

Hi there

An instance reachability check indicates that something has gone wrong at the OS layer of your instance that is stopping it from responding to ARP probes designed to monitor reachability. It sounds like your instance is failing to boot because it cannot start a particular service. I suggested that this kind of thing could be caused by an absence of NVIDIA modules. You could try to fix this by launching a rescue instance[1] in the same AZ as your instance, stopping your instance, detaching its root volume and reattaching it to the rescue instance as /dev/svda1. Then, connect to the instance, mount the volume:

$ mount -o nouuid /dev/xvdf1 /mnt

Mount /dev, /run, /proc, and /sys of the rescue instance to the same paths as the newly mounted volume:

$ for m in dev proc run sys; do mount -o bind {,/mnt}/$m; done

Call the chroot function to change into the mount directory:

$ chroot /mnt

Then update your kernel headers and drivers[2]. Once you are done, detach the volume and reattach it back to the original instance. Hopefully, this will have repaired the issue and the instance will boot successfully.

Resources:

How to build rescue instance:

[1]https:https://aws.amazon.com/premiumsupport/knowledge-center/ec2-instance-boot-issues/

[2]https://forums.developer.nvidia.com/t/black-login-screen-when-nvidia-v470-103-01-drivers-are-installed-on-fedora-36-fresh-install/215643

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions