Skip to content

How do I use EC2Rescue for Linux to troubleshoot OS-level issues?

4 minute read
0

I can't connect to my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance, or I have boot issues. I want to use EC2Rescue to diagnose and troubleshoot operating system (OS) issues.

Short description

You can use EC2Rescue for Linux to take the following actions:

  • Collect system usage reports, such as vmstat, iostat, and mpstat.
  • Collect logs and details, such as syslog, dmesg, application error logs, and AWS Systems Manager logs.
  • Detect system problems, such as asymmetric routing or duplicate root device labels.
  • Automatically fix system issues, such as OpenSSH file permission or kernel parameters activation issues.

Note: If you have access to the EC2 Serial Console, then you can use the console to troubleshoot supported Nitro-based instance types. For more information, see Connect to the EC2 Serial Console. You can also use the AWSSupport-ExecuteEC2Rescue runbook to automatically identify and fix issues that cause connection issues. For more information, see Run the EC2Rescue tool on unreachable instances.

Resolution

Prerequisites: Make sure that your system adheres to the OS and software requirements for EC2Rescue.

Use EC2Rescue to troubleshoot connection issues

Complete the following steps:

  1. Use the Amazon Machine Image (AMI) of the instance with issues to launch a rescue instance in your virtual private cloud (VPC).
    Note: Make sure that the new instance is in the same Availability Zone as the instance with issues. You can also use an existing instance that's in the same Availability Zone as the instance with issues.

  2. Detach the Amazon Elastic Block Store (Amazon EBS) root volume from the instance with issues. Note the device name, such as /dev/xvda and /dev/sda1.

  3. Attach the Amazon EBS volume to the rescue instance as a secondary device, such as /dev/sdf.
    Note: If your instance's root device is a volume backed by Amazon EBS, then stop and restart the instance.

  4. Use SSH to connect to your rescue instance.

  5. As the root user, run the following commands to identify the correct device name:

    $ sudo -i
    # lsblk
    # rescuedev=/dev/xvdf1

    Note: When you run lsblk, note the device name in the output. Replace xvsf1 with the device name of the device that's attached to your rescue instance.

  6. To select an existing temporary mount point that's not already in use, run the following commands:

    # rescuemnt=/mnt
    # mkdir -p $rescuemnt

    Note: It's a best practice to use /mnt as a mount point.

  7. To mount the root file system from the attached volume, run the following command:

    # mount $rescuedev $rescuemnt

    If the volume mount fails, then run the following command:

    dmesg | tail

    If the logs show a universally unique identifier (UUID) that conflicts, then rerun the preceding command with the -o nouuid option. Example:

    mount -o nouuid $rescuedev $rescuemnt
  8. To mount special file systems, and change the root directory to the new file system, run the following command:

    # for i in proc sys dev run; do mount --bind /$i $rescuemnt/$i ; done
    # chroot $rescuemnt
  9. Download and install the EC2Rescue tool for Linux on an offline Linux root volume.

  10. Run EC2Rescue for Linux with no options to run all modules.

  11. Based on the results, run the following command to activate remediation for the supported modules:

    # ./ec2rl run --remediate
  12. To exit chroot and unmount the secondary device, run the following command:

    # exit
    # umount $rescuemnt/{proc,sys,dev,run,}

    Note: If the unmount operation fails, then stop or reboot the rescue instance before you unmount the secondary device.

  13. Detach the secondary volume from the rescue EC2 instance.

  14. Attach the /dev/sdf secondary volume to the original instance as the /dev/xvda or /dev/sda1 root volume.

  15. Start the instance, and then verify that the instance works as expected.

Use another troubleshooting method

To further troubleshoot OS-level issues, take the following actions:

Related information

Recover your impaired instances using EC2Rescue and Amazon EC2 Systems Manager Automation

Troubleshoot issues with Amazon EC2 Windows instances

3 Comments

good aritcle...

replied 3 years ago

I get stuck at step 8:

"chroot $rescuemnt chroot: failed to run command ‘/bin/bash’: No such file or directory"

Little help?

replied 2 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
MODERATOR
replied 2 years ago