How do I use EC2Rescue for Linux to troubleshoot operating system-level issues?

5 minute read
0

I can't connect to my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance, or I'm experiencing boot issues. I want to use EC2Rescue to diagnose and troubleshoot common issues.

Short description

You can use EC2Rescue for Linux in the following ways:

  • Collect system utilization reports, such as vmstat, iostat, and mpstat.
  • Collect logs and details, such as syslog, dmesg, application error logs, and SSM logs.
  • Detect system problems, such as asymmetric routing or duplicate root device labels.
  • Automatically remediate system issues, such as with OpenSSH file permissions or kernel parameters activation.

System requirements

To use EC2Rescue for Linux, your Amazon EC2 Linux instance must meet the prerequisites that are listed on Install EC2Rescue on an Amazon EC2 Linux instance.

Note: If you have access to the EC2 serial console, then you can use the console to troubleshoot supported Nitro-based instance types. To access the serial console, use the Amazon EC2 console or SSH.

Resolution

Use EC2Rescue to troubleshoot connection issues

Complete the following steps:

  1. Use the Amazon Machine Image (AMI) of the nonfunctioning instance to launch a new EC2 instance in your virtual private cloud (VPC). Make sure that your new instance is in the same Availability Zone as the nonfunctioning instance. The new instance becomes your rescue instance. You can also use an existing instance that's in the same Availability Zone as your nonfunctioning instance.

  2. Detach the Amazon Elastic Block Store (Amazon EBS) root volume from your nonfunctioning instance. Examples of root volumes names include /dev/xvda and /dev/sda1. Note the device name to use later.

  3. Attach the Amazon EBS volume to the rescue instance as a secondary device, for example /dev/sdf.
    Note: If your instance's root device is an Amazon EBS backed volume, then stop and restart the instance.

  4. Use SSH to connect to your rescue instance.

  5. As the root user, use lsblk to identify the correct device name:

    $ sudo -i
    # lsblk
    # rescuedev=/dev/xvdf1

    Note: The device that's attached to a rescue instance might have a different device name. To determine the correct device name, run the lsblk command to view your available disk devices, including their mount points. Note the device name to use in the rest of the procedure.

  6. Select an appropriate temporary mount point to use, and make sure that it exists. If /mnt is available, then use this mount point:

    # rescuemnt=/mnt
    # mkdir -p $rescuemnt
  7. Mount the root file system that's from the attached volume:

    # mount $rescuedev $rescuemnt

    Note: If the volume mount fails, then check dmesg | tail. If the logs show a universally unique identifier (UUID) that conflicts, then use the -o nouuid option.

  8. Mount special file systems, and change the root directory to the newly mounted file system:

    # for i in proc sys dev run; do mount --bind /$i $rescuemnt/$i ; done
    # chroot $rescuemnt

    Note: In the preceding command, chroot is the root directory.

  9. Download and install the EC2Rescue tool for Linux on an offline Linux root volume:

    # curl -O https://s3.amazonaws.com/ec2rescuelinux/ec2rl.tgz
    # tar -xf ec2rl.tgz
  10. List the help file to verify the installation:

    # cd ec2rl-<version_number>
    # ./ec2rl help
  11. Run EC2Rescue for Linux with no options to run all modules:

    # ./ec2rl run
  12. To view the results in /var/tmp/ec2rl, run the following command:

    # cat /var/tmp/ec2rl/*/Main.log | more
  13. Based on the results, activate remediation for the supported modules:

    # ./ec2rl run --remediate
  14. Exit from chroot, and unmount the secondary device:

    # exit
    # umount $rescuemnt/{proc,sys,dev,run,}

    Note: If the unmount operation fails, then you might need to stop or reboot the rescue instance to successfully unmount the secondary device.

  15. Detach the secondary volume from the rescue EC2 instance. Then, attach the secondary volume that's named /dev/sdf to the original instance as the root volume that's either /dev/xvda or /dev/sda1.

  16. Start the instance, and then verify that the instance is working.
    Note: You can also use an AWS Systems Manager Automation runbook to troubleshoot connection issues. For more information, see Run the EC2Rescue tool on unreachable instances. The AWSSupport-ExecuteEC2Rescue runbook is designed to automate the steps that are normally required to use EC2Rescue for Linux. These steps are a combination of Systems Manager actions, AWS CloudFormation actions, and AWS Lambda functions.

Use other troubleshooting methods

Related information

Recover your impaired instances using EC2Rescue and Systems Manager Automation

AWS OFFICIAL
AWS OFFICIALUpdated 7 months ago
3 Comments

good aritcle...

profile picture
replied 2 years ago

I get stuck at step 8:

"chroot $rescuemnt chroot: failed to run command ‘/bin/bash’: No such file or directory"

Little help?

replied 9 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 9 months ago