Skip to content

EDRS | Windows Server 2012 R2 DR drill fails on r6a.2xlarge – Server stucks in booting loop

0

We have a Windows Server 2012 R2 in our on-premises environment with an EDRS agent installed. This server is replicated to AWS for DR purposes.

During a DR drill, when we launch the instance in AWS using r6a.2xlarge, only 2 out of 3 system checks pass, and the instance never reaches the ready state (it gets stuck in a booting loop as per the syslog details). We are also unable to connect to the server. Syslog shows boot-related errors (screenshots attached).

Has anyone faced a similar issue? Could you please suggest a possible cause or recommended solution for this problem?

EDRS

1 Answer
0

The boot loop issue you're experiencing with your Windows Server 2012 R2 instance on r6a.2xlarge during DR drill is likely related to missing drivers or instance type compatibility. Here are the key areas to investigate and potential solutions:

Primary Causes:

  1. Missing NVMe Drivers: The r6a instance family uses NVMe-based storage. Your Windows Server 2012 R2 source may lack the necessary NVMe drivers in the AWS-generated initramfs, preventing proper boot on this instance type.

  2. Instance Type Compatibility: The r6a.2xlarge instance family may not be compatible with your specific kernel version or boot configuration from Windows Server 2012 R2.

  3. Boot Configuration Issues: The bootloader configuration may contain incorrect partition references that prevent successful boot on AWS infrastructure.

Recommended Solutions:

  1. Try a Different Instance Family: Launch your DR drill using a different instance type that doesn't require NVMe drivers or is better suited for Windows Server 2012 R2. Consider older generation instance types that may have better compatibility.

  2. Verify Driver Requirements: Ensure your source server has all necessary drivers (NVMe, ENA) installed before replication. The conversion process should inject these, but verification is important.

  3. Check Boot Volume Configuration: Windows sometimes reconfigures drive letters when launching on new infrastructure. After the instance launches (if it does), you may need to remap drive letters correctly.

  4. Use EC2Rescue for Troubleshooting: If you can access the instance, EC2Rescue for Windows Server can diagnose boot issues, including stop errors, boot loops, and corrupted registry problems.

  5. Offline Recovery: If the instance remains inaccessible, detach the root volume, attach it to a rescue instance, and examine/repair the boot configuration and driver issues directly.

The most immediate action would be to attempt launching with a different instance type (such as m5 or t3 family) to determine if this is instance-family specific, then work with AWS Support to resolve the underlying driver or compatibility issue for r6a instances.
Sources
Troubleshoot impaired Amazon EC2 Windows instance using EC2Rescue - Amazon Elastic Compute Cloud
Troubleshooting Elastic Disaster Recovery - AWS Elastic Disaster Recovery
Why is my target instance launched by MGN or DRS failing with a 'dracut-initqueue timeout' error? | AWS re:Post

answered 3 months ago
EXPERT
reviewed 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.