New Ubuntu Instance failing with 1/2 checks passed: instance status check failed

0

We have created a spot instance (Spot2) from a snapshot image of a current spot EC2 instance (Spot1) to upgrade the system, but it is failing with 1/2 checks passed.

Type: r5n.xlarge

OS: Ubuntu 18.0.4

The system logs does not indicate any error or failure point. We created the new instance with the specifications of the current system and launched it in the same VPC/subnet and with same security group.

The only thing we observed is that the hostname of Spot2 is not changed but remains set to the Spot1 (by the system logs), and we tried to change this with user-data but it seems this also was not applied.

For about 2 months now we could not resolve this issue but we need to upgrade to ubuntu 22.0.4 as soon as possible. Please can you help us resolve this issue?

3 Answers
1
Accepted Answer

Hi there,

In most cases, 1/2 status check means there's either a boot issue, or a networking issue. In both cases, having a look at the system logs [1] and instance screenshot [2], is the best way to find the potential culprit. I see you've stated there's no particular information in the system logs, so the screenshot will be your next best bet.

Should the screenshot show the login prompt, this would confirm that there's no boot issue, however, there is a network issue. Reason being it can only get to the login prompt if boot is successful. Further, not being able to connect to the instance when the login prompt is there, confirms network issues.

The opposite is also true, if there's no login prompt on the screenshot, then it's most likely a boot/OS issue.

You can use this information to further troubleshoot with the help of the following document: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstances.html

References: [1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-console.html#instance-console-console-output [2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-console.html#instance-console-screenshot

AWS
answered a year ago
profile picture
EXPERT
reviewed 10 months ago
  • Thanks so much for the response. I have confirmed the login prompt, and will now focus on the network issue. If there are any further hint on the network side, please.

  • Anytime at all.

    Now that we know it's a network issue, you can have a look at the following files:

    Ubuntu 18 makes use of "netplan", therefore to edit the Networking configuration for an Ubuntu 18 instance you need to ensure the netplan yaml file is populated with correct information:

    You can look at "/etc/netplan/50-cloud-init.yaml". When looking at this file, ensure the populated "macaddress" matches that of your instance ENI. You can find the ENI within the "Network" tab of your instance information. You can then open that ENI and find the MAC address. Should the information not match, this will most likely be the cause of the network issue.

    Next, you can look at "/etc/udev/rules.d/ *". Check for the Persistent rules file (something like 70-persistent-net.rules). You are looking to see if there are no default rules created. If you see some rule file named like that, just rename this file and keep it in the same directory. If you cat that file, you will most likely see it has the same incorrect MAC address "ATTR{address}" as in the 50-clout-init.yaml file. Once you've renamed or deleted this "persistent" file, you can try rebooting.

    This is about as much as I can do without seeing the OS myself. Hope it helps!

    Note - It's always useful to spin up a working instance (Same instance type and OS flavor/version) and compare the network files.

0

Again, thanks for the information.

You can look at "/etc/netplan/50-cloud-init.yaml"

How to reach these files on the failed system?

answered a year ago
  • It's a pleasure.

    Best would be to stop the instance. Detach it's volume and attach it to a working instance. Whereby you can mount the newly attached volume and review the files:

    1. Once attached to a working instance connect via SSH
    2. Run "lsblk" to see the disk name. Usually something like xvdf1 or nvme1n1p1.
    3. You can mount it to /mnt with "sudo mount /dev/<partitionname> /mnt Example: sudo mount /dev/xvdf1 /mnt

    Once mounted you will find the file in the mounted path like "/mnt/etc/netplan/50-cloud-init.yaml"

    1. Once you've checked what you need to, you can do the following: a) cd / b) umount /mnt

    2. You can then detach the volume and attach it to the original instance as "/dev/sda1" or "/dev/xvda" depending on what is used.

  • Thanks, the problem was due lack of cloud-init on the system. For some reason, it was removed!

0

Thank you so much!

My solution was with fixing the macaddress in /etc/netplan/50-cloud-init.yaml.

Had to login via serial connection to update it, as it failed network connection. Bummer this is default when creating an Ubuntu 22.04 image. :\

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions