My Amazon Elastic Compute Cloud (Amazon EC2) Linux instance failed the instance status check because of operating system (OS) issues. Now, it doesn't boot successfully.
Short description
Your Linux instance might fail the instance status check for the following reasons:
- You updated the kernel and the new kernel didn't boot.
- You received a "Kernel panic" error.
- The file system entries in /etc/fstab are incorrect or couldn't mount.
- The file system is corrupted.
- There are incorrect network configurations on the instance.
- There's no available CPU or memory on the instance.
To troubleshoot these issues, use the EC2 Serial Console or EC2Rescue to diagnose and troubleshoot boot issues. Or, create a rescue instance and then manually correct the errors.
Note: The following resolution steps are based on Amazon Linux 2023 (AL2023). However, you can use the resolution for Linux distributions in general. If you use a Linux distribution that isn't AL2023, then the commands, paths, and outputs might vary.
Resolution
Important: Before you stop and start your instance, take the following actions.
Note: When you stop and start an instance, the instance's public IP address changes. It's a best practice to use an Elastic IP address to route external traffic to your instance instead of a public IP address. For more information, see What happens when you stop an instance.
Use the EC2 Serial Console
Important: You don't need to stop and start the instance when you use EC2 Serial Console.
Use the EC2 Serial Console to troubleshoot supported Nitro-based instance types and supported bare metal instances. You don't need a functional connection to connect to your instance when you use the EC2 Serial Console. Connect to the EC2 Serial Console, and then check for boot issues, and network and SSH configuration issues.
If you haven't used the EC2 Serial Console before, then make sure that you adhere to the prerequisites. If your instance is unreachable and you haven't already configured access to the serial console, then use EC2Rescue or a rescue instance to troubleshoot.
Run EC2Rescue
Run the EC2Rescue tool to diagnose and troubleshoot the OS on unreachable instances.
Use a rescue instance to manually correct errors
To launch a rescue instance, complete the following steps:
-
Launch a new instance in your virtual private cloud (VPC). Use the same Amazon Machine Image (AMI) and the same Availability Zone as the instance that failed its status check.
Note: You can also use an existing instance with the same AMI and Availability Zone as the instance that failed its status check.
-
Stop the original instance.
-
Detach the root EBS volume, such as /dev/xvda or /dev/sda1, from the original instance. Note the device name of your root volume.
-
Attach the volume as a secondary device (/dev/sdf) to the rescue instance.
-
Use SSH to connect to your rescue instance.
-
To create a mount point directory for the volume that you attached to the rescue instance, run the following command:
sudo mkdir /rescue
Note: Replace /rescue with your mount point directory name.
-
To become the root user, run the following command:
sudo -i
-
As the root user, run the following command to identify the correct device name:
lsblk
Note: The device that's attached to the rescue instance might have a different device name.
-
To mount the volume to the new directory, run the following command:
sudo mount /dev/xvdf1 /rescue
Note: Replace dev/xvdf1 with your root volume's device name and /rescue with your mount point directory name. If you receive an error when you run the preceding command, then see Why can't I mount my Amazon EBS volume?
-
Retrieve the instance's system log to find out the root cause of the issue.
To troubleshoot OS errors that cause instance status check failures, take the following troubleshooting actions based on the error that you receive. After you correct the errors, restart your original instance.
Troubleshoot "Kernel panic"
If you receive a "Kernel Panic" error message, then the kernel doesn't have the vmlinuz or initramfs files it needs to boot. To troubleshoot this issue, complete the following steps:
- To check for the vmlinuz and initramfs files, run the following command:
cd /rescue/boot
ls -l
In the output, verify that the vmlinuz and initramfs files correspond to the kernel version that you want to boot.
Example output:
uname -r4.14.165-131.185.amzn2.x86_64
cd /boot; ls -l
total 39960
-rw-r--r-- 1 root root 119960 Jan 15 14:34 config-4.14.165-131.185.amzn2.x86_64
drwxr-xr-x 3 root root 17 Feb 12 04:06 efi
drwx------ 5 root root 79 Feb 12 04:08 grub2
-rw------- 1 root root 31336757 Feb 12 04:08 initramfs-4.14.165-131.185.amzn2.x86_64.img
-rw-r--r-- 1 root root 669087 Feb 12 04:08 initrd-plymouth.img
-rw-r--r-- 1 root root 235041 Jan 15 14:34 symvers-4.14.165-131.185.amzn2.x86_64.gz
-rw------- 1 root root 2823838 Jan 15 14:34 System.map-4.14.165-131.185.amzn2.x86_64
-rwxr-xr-x 1 root root 5718992 Jan 15 14:34 vmlinuz-4.14.165-131.185.amzn2.x86_64
Note: The preceding example shows an Amazon Linux 2 (AL2) instance with kernel version 4.14.165-131.185.amzn2.x86_64. The /boot directory has the required initramfs-4.14.165-131.185.amzn2.x86_64.img and vmlinuz-4.14.165-131.185.amzn2.x86_64 files.
- If the initramfs and the vmlinuz files aren't present, then boot the instance with a previous kernel that has both of the files.
For more information about how to resolve kernel panic errors, see How do I resolve the "Kernel panic - not syncing" error in my EC2 instance?
Troubleshoot "Failed to mount" or "Dependency failed"
If the /etc/fstab file has incorrect mount point entries, then you receive a "Failed to mount" or "Dependency failed" error message. To troubleshoot this issue, complete the following steps:
- Verify that the mount point entries in the /etc/fstab are correct.
- To correct file system inconsistencies, it's a best practice to run the fsck or xfs_repair tool. Before you run the tool, create a backup of your Amazon Elastic File System (Amazon EFS) file system. Then, run the following command to unmount your mount:
sudo umount /rescue
Note: Replace /rescue with your mount point directory name.
- Run the fsck or xfs_repair tool, based on your file system.
For ext4 file systems, run the following command:
sudo fsck /dev/sdffsck from util-linux 2.30.2
e2fsck 1.42.9 (28-Dec-2013)
/dev/sdf: clean, 11/6553600 files,
459544/26214400 blocks
For XFS file systems, run the following command:
sudo xfs_repair /dev/sdfxfs_repair /dev/xvdf
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
Troubleshoot "interface eth0: failed"
If you receive the "interface eth0: failed" error message, then verify that the ifcfg-eth0 file has the correct network entries. You can find the network configuration file that corresponds to the eth0 primary interface at /etc/sysconfig/network-scripts/ifcfg-eth0. If the device name of your primary interface isn't eth0, then the file name starts with ifcfg followed by the device name. The file is in the /etc/sysconfig/network-scripts directory on the instance.
To find the ifcfg-eth0 file, complete the following steps:
- Run the following command to view the network configuration file for the eth0 primary interface:
sudo cat /etc/sysconfig/network-scripts/ifcfg-eth0
Note: If needed, replace eth0 in the preceding command with the name of your primary interface.
The following example output contains the correct entries for the network configuration file located in /etc/sysconfig/network-scripts/ifcfg-eth0:
$ sudo cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes
TYPE=Ethernet
USERCTL=yes
PEERDNS=yes
DHCPV6C=yes
DHCPV6C_OPTIONS=-nw
PERSISTENT_DHCLIENT=yes
RES_OPTIONS="timeout:2 attempts:5"
DHCP_ARP_CHECK=no
If ONBOOT isn't set to yes, then you didn't configure your primary network interface to come up at boot.
- To change the ONBOOT value, run the following command to open the file:
sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0
Note: The preceding command uses the vi editor to edit the file.
- To modify the file, press I.
- Choose the ONBOOT entry, and then change the value to yes.
- Press :wq! to save and exit the file.
For more errors and resolution steps, see Troubleshoot system log errors for Linux instances.
Restart the original instance
Complete the following steps:
- To unmount the secondary device from your rescue instance, run the following command:
sudo umount /rescue
Note: Replace /rescue with your mount point directory name.
If the unmount operation fails, then stop or reboot the rescue instance. Then, rerun the preceding command.
- Detach the secondary volume from the rescue instance.
- Attach the volume to the original instance as the /dev/xvda or /dev/sda1 root volume.
- Start the instance, and then verify that the instance is responsive.
Related information
How do I troubleshoot status check failures for my EC2 Linux instance?
Troubleshoot Amazon EC2 Linux instances with failed status checks
Why doesn't my Linux instance boot after I changed it to a Nitro-based instance?