An update prevented my Amazon Elastic Compute Cloud (Amazon EC2) instance reboot. I want to revert to a stable kernel.
Short description
If you made a kernel update to your EC2 Linux instance but the kernel is now corrupt, then the instance can't reboot. You also can't use SSH to connect to the affected instance.
To troubleshoot this issue, use the EC2 Serial Console to access your root volume. Or, create a temporary rescue instance, and then remount your Amazon Elastic Block Store (Amazon EBS) volume on the rescue instance. Configure your GNU GRUB to use the previous kernel, and then reboot the instance.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Access the instance's root volume
To access the root volume, use the EC2 Serial Console or a rescue instance.
Use the EC2 Serial Console
Prerequisites: You must have already configured access to the EC2 Serial Console. If your instance is unreachable and you didn't already configure access, then you must use a rescue instance to access the root volume. Also, make sure that you adhere to the serial console prerequisites.
If you activated EC2 Serial Console for Linux, then use it for Nitro-based instance types to troubleshoot boot, network configuration, and SSH configuration issues.
You can use the serial console to connect to your instance without a working network connection.
Before you use the serial console, grant it access at the AWS account level. Then, create AWS Identity and Access Management (IAM) policies that grant access to your IAM users. Every instance that uses the serial console must include at least one password-based user.
Use a rescue instance
Important: Don't perform this procedure on an instance store-backed instance. The recovery procedure requires a stop and start of your instance, so you will lose the instance's data.
To use a rescue instance to access the root volume, complete the following steps:
-
Create an Amazon EBS snapshot of the root volume.
-
Stop the affected instance.
-
Detach the Amazon EBS root volume (/dev/xvda or /dev/sda1) from the affected instance. Note the device name of your root volume.
Note: To help identify the EBS volume in later steps, tag the volume before you detach it. The root device differs by Amazon Machine Image (AMI). For example, Amazon Linux 2 (AL2) and Amazon Linux 2023 (AL2023) use /dev/xvda. However, Ubuntu 14, 16, 18, CentOS 7, and Red Hat Enterprise Linux (RHEL) 7.5, use /dev/sda1.
-
Launch a rescue EC2 instance in the same Availability Zone as your snapshot.
Note: Check your instance product code. Some product codes require you to launch an EC2 instance in the same operating system (OS) type. For example, if the affected instance is a paid RHEL AMI, then you must launch an AMI with the same product code. If you have an AL2 instance, then you must create an AL2 rescue instance to avoid errors.
-
Attach the volume as a secondary device (/dev/sdf) to the rescue instance.
-
Use SSH to connect to the rescue instance.
-
To view your available disk devices, run the following command:
lsblk
Example output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 15G 0 disk
└─xvda1 202:1 0 15G 0 part /
xvdf 202:0 0 15G 0 disk
└─xvdf1 202:1 0 15G 0 part
Note: Nitro-based instances show EBS volumes as NVMe block devices with the nvme[0-26]n1 disk name. Example output on a Nitro-based instance:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 8G 0 disk
└─nvme0n1p1 259:1 0 8G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
nvme1n1 259:3 0 100G 0 disk
└─nvme1n1p1 259:4 0 100G 0 part /
-
To become the root user, run the following command:
sudo -i
-
To mount the root partition of the mounted volume to /mnt, run the following command:
mount -o nouuid /dev/xvdf1 /mnt
Note: Replace /dev/xvdf1 with the root partition of your volume. The nouuid mount option is for XFS file systems only. If you experience issues when you run the preceding command, then run the following command to identify your filesystem type:
lsblk -f
Then, run the following command based on your file system to mount the partition.
VFAT/FAT32 filesystems:
mount /dev/xvdf1 /mnt
ext4 filesystems:
mount /dev/xvdf1 /mnt
Note: Replace /dev/xvdf1 with the root partition of your volume.
If /mnt doesn't exist on your configuration, then run the following commands to create a mount directory, and then mount the root partition to the new directory:
mkdir /mnt
mount -o nouuid /dev/xvdf1 /mnt
Note: If you receive an error when you run the preceding mount command, then run the following command instead:
mount /dev/xvdf1 /mnt
Next, use the mount directory to access the affected instance's data.
-
To mount /dev, /run, /proc, and /sys of the rescue instance to the same paths as the mounted volume, run the following command:
for m in dev proc run sys; do mount -o bind {,/mnt}/$m; done
-
If you have a separate /boot partition, then mount it to /mnt/boot.
-
To change into the mount directory, run the following command:
chroot /mnt
Update the default kernel in the GRUB bootloader
You can find the corrupt kernel in position 0 in the list, and the last stable kernel in position 1. To replace the corrupt kernel with the stable kernel, complete the following steps based on your distribution.
GRUB1 (Legacy GRUB) for Red Hat 6
To replace the corrupt kernel with the stable kernel in the /boot/grub/grub.conf file, run the following command:
sed -i '/^default/ s/0/1/' /boot/grub/grub.conf
GRUB2 for Ubuntu 14 LTS, 16.04, and 18.04
Complete the following steps:
-
To replace the corrupt GRUB_DEFAULT=0 default menu entry with the stable GRUB_DEFAULT=saved value in the /etc/default/grub file, run the following command:
sed -i 's/GRUB_DEFAULT=0/GRUB_DEFAULT=saved/g' /etc/default/grub
-
To make sure that GRUB recognizes the change, run the following command:
update-grub
Note: You might receive the "device-mapper: reload ioctl on osprober-linux-xvdaX failed: Device or resource busy Command failed" error when you rebuild the grub configuration file. To resolve this issue, add the GRUB_DISABLE_OS_PROBER=true parameter to the /etc/default/grub file, and then rerun the preceding command.
-
To make sure that Amazon EC2 loads the stable kernel at the next reboot, run the following command:
grub-set-default 1
GRUB2 for RHEL 7 and AL2
Complete the following steps:
-
To replace the corrupt GRUB_DEFAULT=0 default menu entry with the stable GRUB_DEFAULT-saved value in the /etc/default/grub file, run the following command:
sed -i 's/GRUB_DEFAULT=0/GRUB_DEFAULT=saved/g' /etc/default/grub
-
To update GRUB to regenerate the /boot/grub2/grub.cfg file, run the following command:
grub2-mkconfig -o /boot/grub2/grub.cfg
Note: You might receive the "device-mapper: reload ioctl on osprober-linux-xvdaX failed: Device or resource busy Command failed" error when you rebuild the grub configuration file. To resolve this issue, add the GRUB_DISABLE_OS_PROBER=true parameter to the /etc/default/grub file, and then rerun the preceding command.
-
To make sure that Amazon EC2 loads the stable kernel at the next reboot, run the following command:
grub2-set-default 1
GRUB2 for RHEL 8 and CentOS 8, and AL2023
GRUB2 uses blscfg files and entries in /boot/loader for the boot configuration, instead of the previous grub.cfg format. It's a best practice to use the grubby tool to manage the blscfg files and retrieve information from the /boot/loader/entries/. If the blscfg files are missing or corrupted, then grubby doesn't show any results. You must regenerate the files to recover functionality.
To update the default kernel in GRUB2, complete the following steps:
-
To see the current default kernel, run the following command:
grubby --default-kernel
-
To see all available kernels and their indexes, run the following command:
grubby --info=ALL
Example output:
root@ip-172-31-29-221 /]# grubby --info=ALLindex=0
kernel="/boot/vmlinuz-4.18.0-305.el8.x86_64"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau nvme_core.io_timeout=4294967295 crashkernel=auto $tuned_params"
root="UUID=d35fe619-1d06-4ace-9fe3-169baad3e421"
initrd="/boot/initramfs-4.18.0-305.el8.x86_64.img $tuned_initrd"
title="Red Hat Enterprise Linux (4.18.0-305.el8.x86_64) 8.4 (Ootpa)"
id="0c75beb2b6ca4d78b335e92f0002b619-4.18.0-305.el8.x86_64"
index=1
kernel="/boot/vmlinuz-0-rescue-0c75beb2b6ca4d78b335e92f0002b619"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau nvme_core.io_timeout=4294967295 crashkernel=auto"
root="UUID=d35fe619-1d06-4ace-9fe3-169baad3e421"
initrd="/boot/initramfs-0-rescue-0c75beb2b6ca4d78b335e92f0002b619.img"
title="Red Hat Enterprise Linux (0-rescue-0c75beb2b6ca4d78b335e92f0002b619) 8.4 (Ootpa)"
id="0c75beb2b6ca4d78b335e92f0002b619-0-rescue"
index=2
kernel="/boot/vmlinuz-4.18.0-305.3.1.el8_4.x86_64"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau nvme_core.io_timeout=4294967295 crashkernel=auto $tuned_params"
root="UUID=d35fe619-1d06-4ace-9fe3-169baad3e421"
initrd="/boot/initramfs-4.18.0-305.3.1.el8_4.x86_64.img $tuned_initrd"
title="Red Hat Enterprise Linux (4.18.0-305.3.1.el8_4.x86_64) 8.4 (Ootpa)"
id="ec2fa869f66b627b3c98f33dfa6bc44d-4.18.0-305.3.1.el8_4.x86_64"
Note the kernel path that you set as the default for your instance. In the preceding example, the kernel path at index 2 is /boot/vmlinuz- 0-4.18.0-80.4.2.el8_1.x86_64.
-
To change the default kernel of the instance, run the following command:
grubby --set-default=/boot/vmlinuz-4.18.0-305.3.1.el8_4.x86_64
Note: Replace 4.18.0-305.3.1.el8_4.x86_64 with your kernel's version number.
-
To verify that you correctly configured the default kernel, run the following command:
grubby --default-kernel
Reboot the instance
If you used the EC2 Serial Console, then Amazon EC2 now loads the stable kernel. You can reboot the instance.
If you used a rescue instance to access the root volume, then complete the following steps:
-
To exit from chroot and unmount /dev, /run, /proc, and /sys, run the following command:
exit
umount /mnt/{dev,proc,run,sys,}
-
Stop the rescue instance.
-
Detach the root volume from the rescue instance.
-
Attach the root volume to the original instance as the /dev/xvda or /dev/sda1 root volume
-
Start the original instance.
-
Amazon EC2 now loads the stable kernel. You can reboot the instance.