How do I troubleshoot an AWS Replication Agent installation failure on my EC2 Linux instance?

7 minute read
1

I want to install the AWS Replication Agent for AWS Application Migration Service or AWS Elastic Disaster Recovery. The installation failed on my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance.

Resolution

The following resolution covers the most common AWS Replication Agent installation errors on Linux operating systems (OSs).

Identify the error

The AWS Replication Agent installer log shows errors at the end of the log.

To determine the error, run the following command to view the last page of the installer log:

less +G aws_replication_installer.log

To resolve the error that you find, follow the procedure in the section that relates to the error.

Error: failed to map segment from shared object: Operation not permitted

Error example:

"./aws-replication-installer-64bit: error while loading shared libraries: libz.so .1: failed to map segment from shared object: Operation not permitted"

The installation script uses the /tmp directory. If noexec is set on /tmp, then libz.so can't map segments. When this occurs, you receive this operation not permitted error.

To resolve this error, run the following command to mount the volume with execute permission:

# sudo mount /tmp -o remount,exec

Error: security token included in the request is expired

Error example

"botocore.exceptions.ClientError: An error occurred (ExpiredTokenException) when calling the GetAgentInstallationAssetsForDrs operation: The security token included in the request is expired [installation_id: 1a9af9d3-9485-4e02-965e-611929428c61, agent_version: 3.7.0, mac_addresses: 206915885515739,206915885515740, _origin_client_type: installer]"

This error is often caused by an expired AWS Identify and Access Management (IAM) role. When the IAM role expires, API calls fail to the Application Migration Service or Elastic Disaster Recovery endpoint.

To resolve this issue, refresh the IAM role, or install the role with an access key or secret access key. For more information, see the following AWS Documentation:

Error: Module aws_replication_driver is not currently loaded

Error example

"rmmod: ERROR: Module aws_replication_driver is not currently loaded insmod: ERROR: could not insert module ./aws-replication-driver.ko: Required key not available"

This error occurs when secure boot is turned on in the source instance. Application Migration Service and Elastic Disaster Recovery don't support secure boot.

To resolve this error, turn off secure boot in the source instance.

Error: ssl.SSLCertVerificationError

Error example

"ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997) - urllib.error.URLError: <urlopen error unknown url type: https>"

This error can occur when the client uses an earlier OS version with Python 3.10 or later. Python 3.10 added the PEP 644 – Require OpenSSL 1.1.1 or newer proposal on their Python Enhancement Proposals website.

Earlier OS versions don't have the newest OpenSSL library that supports Python 3.10. So, the AWS Replication Agent installation fails to verify the SSL certificate to the Application Migration Service or Elastic Disaster Recovery endpoint.

To avoid this error, use an earlier version of Python, such as version 2.7 or 3.8.

Note: To resolve most urllib/SSL errors, use an earlier version of Python.

Error: botocore.exceptions.CredentialRetrievalError

Error example:

"botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from cert: Oct 17, 2022 9:38:54 AM com.amazonaws.cloudendure.credentials_provider.SharedMain createAndSaveJks"

This error might occur when you modify the AWS Replication Agent role AWSElasticDisasterRecoveryAgentRole for Elastic Disaster Recovery and AWSApplicationMigrationAgentRole for Application Migration Service.

To resolve this error, make sure that the AWS Replication Agent role is as follows:

Application Migration Service

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "PrincipalGroup": {
        "AWS": "svc:mgn.amazonaws.com"
      },
      "Action": [
        "sts:AssumeRole",
        "sts:SetSourceIdentity"
      ],
      "Condition": {
        "StringLike": {
          "sts:SourceIdentity": "s-*",
          "aws:SourceAccount": "AWSACCOUNTIDHERE"
        }
      }
    }
  ]
}

Elastic Disaster Recovery

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "PrincipalGroup": {
        "AWS": "svc:drs.amazonaws.com"
      },
      "Action": [
        "sts:AssumeRole",
        "sts:SetSourceIdentity"
      ],
      "Condition": {
        "StringLike": {
          "aws:SourceAccount": "AWSACCOUNTIDHERE",
          "sts:SourceIdentity": "s-*"
        }
      }
    }
  ]
}

Error: A dependency job for aws-replication.target failed.

Error example:

"stderr: A dependency job for aws-replication.target failed. See 'journalctl -xe' for details"

There are two possible causes for this error:

  • The /var directory has permissions of 754.
  • There was an issue during the creation of a Linux group for the aws-replication user.

To resolve the /var issue, run chmod 755 for the /var directory.

To resolve the Linux group issue, complete the following steps:

  1. Uninstall AWS Replication Agent.

  2. Run the following commands to delete the aws-replication user and aws-replication group:

    # userdel aws-replication 
    # groupdel aws-replication
  3. Reinstall AWS Replication Agent.

For more information and installation prerequisites, see the following AWS Documentation:

Error: Exception in thread "main" com.amazonaws.services.drs.model.InternalServerException

Error example:

"Exception in thread "main" com.amazonaws.services.drs.model.InternalServerException: An unexpected error has occurred (Service: Drs; Status Code: 500; Error Code: InternalServerException; Request ID: 4f4a76cb-aaec-44cc-a07a-c3579454ca55; Proxy: null"

This error occurs when the client turns off the AWS Security Token Service (AWS STS) endpoint. When the STS endpoint is turned off, Application Migration Service or Elastic Disaster Recovery can't call AWS STS to assume the role in the client account.

To resolve this error, turn on the STS endpoint in the client.

Error: could not insert module ./aws-replication-driver.ko: Required key not available

This error occurs when the OS has secure boot turned on. Application Migration Service and Elastic Disaster Recovery don't support Linux OSs with secure boot turned on.

To resolve this error, turn off secure boot for the Linux OS. On most OSs, you turn off secure boot in the hypervisor.

Error: could not insert module ./aws-replication-driver.ko: Cannot allocate memory

Error example:

"insmod: ERROR: could not insert module ./aws-replication-driver.ko: Cannot allocate memory rmmod: ERROR: Module aws_replication_driver is not currently loaded ] 2023-03-16 10:27:08,416 ERROR Exception during agent installation Traceback (most recent call last): File "cirrus/installer_shared/installer_main.py", line 308, in run_agent_installer_command_linux File "shared/installer_utils/command_utils.py", line 161, in run shared.installer_utils.command_utils.RunException: command: /tmp/tmp_t"

This error occurs when the Linux OS doesn't have sufficient memory for the agent installation.

To resolve this error, make sure that your OS has at least 300 MB of free memory.

Error: Unexpected error while making agent driver! Are kernel linux headers installed correctly?

Error example:

"Unexpected error while making agent driver! Are kernel linux headers installed correctly? Installation returned with code 1Installation failed due to unspecified error:"

When you install the agent, the installation downloads a kernel-devel package that matches your current package. You can find the current package in the package repository that's configured in your Linux OS.

This error occurs when the agent installation workflow can't install the kernel-devel package in the Linux OS's running kernel.

To resolve this error, review the installation log to verify that the issue is because of repository access. Then, manually download the kernel-devel package from the internet. After you download the package, run the installation again.

You can download the kernel-devel package from the following websites:

The AWS Replication Agent also installs dependencies that are required for the installation, such as make gcc perl tar gawk rpm. For more information see, Linux installation requirements.

AWS OFFICIAL
AWS OFFICIALUpdated 4 months ago
4 Comments

Thanks for sharing all the tips, I am still facing the issue " Kernel version is not supported", I am running Ubuntu 22.04 VM on Azure and trying to migrate it to AWS using MGN.

profile picture
replied 6 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 6 months ago

Remounting the /tmp can break my already running process in Prod ENV, Can you please share other workaround or solution for this?

profile picture
replied 5 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 5 months ago