DRS - Installing aws-replication-agent AZLinux2, botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from cert

0

Hello there. I have had a disaster of an experience converting our infrastructure from CloudEndure to DRS, but I'm approaching the finish line. Of course it is my most important instances that are still having trouble.

This particular instance would not convert, and I ended up redeploying. I'm now trying to install the aws replication agent directly, so it will be a full sync, not a cloudendure conversion. Even still, with a fresh deployment I continue to encounter the same error that I did when trying the conversion tool.

Command run = ./aws-replication-installer-init --region us-east-2 --aws-access-key-id abc --aws-secret-access-key abc --aws-session-token abc

Command output = The installation of the AWS Replication Agent has started. Identifying volumes for replication. Choose the disks you want to replicate. Your disks are: (lists disks) To replicate some of the disks, type the path of the disks, separated with a comma (for example, /dev/sda, /dev/sdb). To replicate all disks, press Enter: Identified volume for replication: (lists all volumes and sizes) All volumes for replication were successfully identified. Downloading the AWS Replication Agent onto the source server... Finished. Installing the AWS Replication Agent onto the source server... Finished. Syncing the source server with the Elastic Disaster Recovery Console... Finished.

Unexpected Error Installation failed. Learn more about installation issues in our documentation at https://docs.aws.amazon.com/drs/latest/userguide/Troubleshooting-Agent-Issues.html

The relevant ending of aws_replication_agent_installer.log = 2023-07-21 12:14:16,816 INFO Finished Syncing the source server with the Elastic Disaster Recovery Console [installation_id: abc, agent_version: 5.0.0.2023.187.1134, aws_instance_id: i-abc, mac_addresses: abc, _origin_client_type: installer, account_id: abc, _origin_instance_id: i-abc, _origin_agent_id: s-abc] 2023-07-21 12:14:17,376 INFO Unexpected Error Installation failed. Learn more about installation issues in our documentation at https://docs.aws.amazon.com/drs/latest/userguide/Troubleshooting-Agent-Issues.html [installation_id: abc, agent_version: 5.0.0.2023.187.1134, aws_instance_id: i-abc, mac_addresses: abc, _origin_client_type: installer, account_id: abc, _origin_instance_id: i-abc, _origin_agent_id: s-abc] 2023-07-21 12:14:17,376 ERROR Installation was not finished successfully Traceback (most recent call last): File "dirrus/installer_shared/installer_main.py", line 1384, in _main File "dirrus/installer_shared/installer_main.py", line 1159, in install_agent File "dirrus/installer_shared/installer_main.py", line 950, in install File "dirrus/installer_shared/installer_main.py", line 139, in issue_agent_certificate File "botocore/credentials.py", line 1033, in load File "botocore/credentials.py", line 1057, in _retrieve_credentials_using botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from cert: [installation_id: abc, agent_version: 5.0.0.2023.187.1134, aws_instance_id: i-abc, mac_addresses: abc, _origin_client_type: installer, account_id: abc, _origin_instance_id: i-abc, _origin_agent_id: s-abc]

I have been working this issue with AWS support for over a month, it's been back and forth with "internal support" for weeks, but so far, this is as far as I've been able to get with this instance.

Several other azlinux2 instances that are in the same vpc, subnet, and security groups were able to install and replicate without encountering this error. And the really bizarre part is that the error still occurred even when we retired the old instance, and built a new instance from script, beginning with the same base script that the rest of our servers are built with.

I urgently need to get this machine working with DRS, since it was rebuilt, it is no longer protected by Cloudendure either.

Anyone have any ideas or encounter the same problem?

Jesper
asked 9 months ago245 views
1 Answer
0

Hello,

Firstly, I thank you for your patience while the internal team is still investigating the issue and I do understand the urgency to get this working.

Here are a few things you might want to check. I do understand you might have already checked some/all of these suggestions but I am including these here for the sake of completeness:

  • Ensure the instance meets requirements outlined in the article[1].
  • Kindly ensure that there are no conflicting operating system level configuration which might interfere with agent installation.

If you are able to replicate the issue, I would also request you to share the exact steps on the support case so that the team can test the same on their end.

Lastly, we would require details that are non-public information to investigate further. Hence, please continue to update the details on the existing support case with AWS so that our teams continue investigation and provide suggestions.

References: [1] https://docs.aws.amazon.com/drs/latest/userguide/Troubleshooting-Agent-Issues.html

AWS
SUPPORT ENGINEER
answered 9 months ago
  • Was this issue ever resolved? I'm getting the same error trying to use the AWS Replication Agent and don't want to unnecessarily create a support case.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions