By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How do I recover my RHEL 8 or CentOS 8 Amazon EC2 instance that fails to boot because of issues with the GRUB2 BLS configuration file?

5 minute read
0

I have a Red Hat Enterprise Linux (RHEL) 8 or CentOS 8 Amazon Elastic Compute Cloud (Amazon EC2) instance. I want to recover a corrupted or deleted BLS configuration (blscfg) file that's in "/boot/loader/entries/".

Short description

GRUB2 in RHEL 8 and Centos 8 use blscfg files and entries in /boot/loader for the boot configuration instead of the previous grub.cfg format. It's a best practice to use the grubby tool to manage the blscfg files and retrieve information from /boot/loader/entries/.

If the blscfg files are corrupted or missing from /boot/loader/entries/, then grubby doesn't show any results. You must regenerate the files to recover functionality. To regenerate the blscfg, create a temporary rescue instance, and then remount your Amazon Elastic Block Store (Amazon EBS) volume on the rescue instance. From the rescue instance, regenerate the blscfg for any installed kernels.

Important: Don't perform the procedure on an instance that's backed by an instance store. The recovery process requires you to stop and start your instance. When you stop and start your instance, you lose all data on the instance. For more information, see Root volumes for your Amazon EC2 instances.

Resolution

Attach the root volume to a rescue EC2 instance

Complete the following steps:

  1. Create an Amazon EBS snapshot of the root volume.
  2. Open the Amazon EC2 console.
    Note: Verify that you're in the correct AWS Region.
  3. In the navigation pane, choose Instances, and then choose the affected instance.
  4. Choose Actions, and then choose Instance State.
  5. Choose Stop.
  6. On the Description tab, under Root device, choose /dev/sda1, and then choose the EBS ID.
  7. Choose Actions, and then choose Detach Volume.
  8. Choose Yes, and then choose Detach. Note the Availability Zone value.
  9. Launch an EC2 instance in the same Availability Zone to use as the rescue instance.
  10. In the navigation pane, choose Volumes, and then select the detached root volume of the affected instance.
  11. Choose Actions, and then choose Attach volume.
  12. Choose the rescue instance ID, and then for Device name, select an unused device, for example /dev/sdf.

Mount the volume of the affected instance

Complete the following steps:

  1. Use SSH to connect to the rescue instance.

  2. Run the lsblk command to view your available disk devices:

    [ec2-user@ip-10-10-1-111 ~]$ lsblk

    Example output:

    NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINTxvda    202:0    0  10G  0 disk
    ├─xvda1 202:1    0   1M  0 part
    └─xvda2 202:2    0  10G  0 part /
    xvdf    202:80   0  10G  0 disk
    ├─xvdf1 202:81   0   1M  0 part
    └─xvdf2 202:82   0  10G  0 part 

    Note: Nitro-based instances show EBS volumes as NVMe block devices. The output that the lsblk command generates on Nitro-based instances shows the disk names as nvme[0-26]n1. For more information, see Amazon EBS volumes and NVMe.

  3. Run the following commands to create a mount directory and mount the root partition of the mounted volume to the new directory:

    [ec2-user@ip-10-10-1-111 ~]$ sudo -i
    [root@ip-10-10-1-111 ~]# mkdir /mount
    [root@ip-10-10-1-111 ~]#  mount /dev/xvdf2 /mount

    Note: Replace /dev/xvdf2 with the root partition of your mounted volume. For more information, see the Linux instances section of Make an Amazon EBS volume available for use.

  4. Run the following command to mount /dev, /run, /proc, and /sys of the rescue instance to the same paths as the newly mounted volume:

    [root@ip-10-10-1-111 ~]# for i in dev proc sys run; do mount -o bind /$i /mount/$i; done
  5. Run the following command to start the chroot environment:

    [root@ip-10-10-1-111 ~]# chroot /mount

Regenerate the blscfg files

Complete the following steps:

  1. Run the rpm command to find information about available kernels in your instance:

    [root@ip-10-10-1-111 ~]# rpm -q --last kernel

    Example output:

    kernel-4.18.0-147.3.1.el8_1.x86_64 Tue 21 Jan 2020 05:11:16 PM UTC
    kernel-4.18.0-80.4.2.el8_0.x86_64 Tue 18 Jun 2019 05:06:11 PM UTC

    Note the available kernels in the output.

  2. Run the kernel-install command for each installed kernel in the instance to recreate the blscfg file:

    [root@ip-10-10-1-111 ~]#  kernel-install add 4.18.0-147.3.1.el8_1.x86_64 /lib/modules/4.18.0-147.3.1.el8_1.x86_64/vmlinuz

    Note: Replace 4.18.0-147.3.1.el8_0.x86_64 with your kernel version number. The systemd-udev rpm installation package provides the kernel-install binary.
    The blscfg for the designated kernel regenerates under /boot/loader/entries/.
    Example:

    [root@ip-10-10-1-111 ~]# ls /boot/loader entries/2bb67fbca2394ed494dc348993fb9b94-4.18.0-147.3.1.el8_1.x86_64.conf
  3. The last default kernel that you set with kernell-install becomes the default kernel. To see the current default kernel, run the --default kernel grubby command:

    [root@ip-10-10-1-111 ~]# grubby --default-kernel
  4. Run the following commands to exit from chroot and unmount the /dev, /run, /proc, and /sys mounts:

    [root@ip-10-10-1-111 ~]# exit
    [root@ip-10-10-1-111 ~]# umount /mount/{dev,proc,run,sys,}
    [root@ip-10-10-1-111 ~]# umount /mount
  5. Mount the device back to the original instance with the correct block device mapping. The device now boots with the default kernel.

Related information

How do I revert to a known stable kernel after an update prevents my Amazon EC2 instance from rebooting successfully?

AWS OFFICIAL
AWS OFFICIALUpdated 13 days ago