In AWS Elastic Disaster Recovery - how to fix "Start reversed replication" failure "Verify the agent is installed and running"?

0

We have performed a failover from DC1 to DC2 - everything ok.

In establishing replication back from DC2 to DC1 using "Start reversed replication" we have 1 server which gives the error i-03a15f3c315f0f18c: AWS Replication agent is not connected to DRS. Verify the agent is installed and running, and that it has connectivity to the service

We have followed the same procedure on other instances - reinstall the agent using download from the target region - and running "Start reversed replication" has worked.

With this 1 instance - the AWS EDR console in the Target environment - indicated syncing, snapshotting and now says Target Environment: Source Servers

While the source environment says Source Environment: Recovery Instance

I have tried re-installing the agent - which results in the same set of steps (sync, snapshot)

I am confused as the error indicates "not connected", but clearly connection has been made to get the instance registered on the Target side and synced etc.

DaveH
已提問 2 個月前檢視次數 126 次
3 個答案
0
已接受的答案

I made a mistake in

We have followed the same procedure on other instances - reinstall the agent using download from the target region - and running "Start reversed replication" has worked.

We have performed a cleanup in AWS EDR (disconnected and removed all recovery and source instances) and started from scratch. Now after full sync of source servers and "recovery".

This time not doing "reinstall the agent using download from the target region" (that was me not understanding the technology) and purely "Start Reversed Replication" has successfully established "Source servers" back in the primary region.

DaveH
已回答 2 個月前
0

In certain cases, following an attempt to perform a reverse replication action, you will receive an error message indicating that the AWS Replication agent is not connected to AWS Elastic Disaster Recovery. In this case, verify that:

  1. The agent is installed and running
  2. The server is connected to the internet or the NAT gateway

Further,the recovery instance will require to allow an inbound connection on port 1500 from the source environment which can be via VPN/DX or via public IP.

Further, I request you to please check the following:

  • Check the routes to metadata connectivity on the Recovery Instance.
  • Restart the AWS Replication service.

Also, if the issues still persists, please get back to us with the latest agent log from the recovery instance on a support case.

  1. DRS Source ID (starting with s-xx) associated with the recovery instance.

  2. DRS agent logs located at (/var/lib/aws-replication-agent/agent.log.0)

  3. DR AWS Region

  4. What type of DRS setup in use - => AWS to AWS (cross Availability Zone or Cross Region): => On-premises to AWS:

  5. Confirm all the agent processes are running in recovered instance: $ ps aux | grep aws-replication

I can confirm that the recovery instance has "AWSElasticDisasterRecoveryRecoveryInstancePolicy" attached to the role "AWSElasticDisasterRecoveryRecoveryInstanceRole".

Hence, additional details are required further to troubleshoot the current issue.

Please refer:

[1] https://docs.aws.amazon.com/drs/latest/userguide/Troubleshooting-Failback-Errors.html [2] https://docs.aws.amazon.com/drs/latest/userguide/Network-Requirements.html

AWS
支援工程師
Jeff_B
已回答 2 個月前
profile picture
專家
已審閱 2 個月前
0

Hi, thanks for the info/suggestions

When I checked this morning the agent was not running (noting from "ps aux | grep aws-replication") but I believe it must have run originally as the instance appears in the "Source Servers" of the original primary region.

I have restarted the agent ("sudo /var/lib/aws-replication-agent/runAgent.sh") and now the instance in "Source Servers" show's "Rescanning".

Can you possibly indicate how I check these:

  • Check the routes to metadata connectivity on the Recovery Instance
    - I have successfully run nc -zv X.X.X.X 1500 but do you mean something else?

  • Restart the AWS Replication service. - does this mean the Agent on the instance or some other region wide service?

Edit:

I am wondering if I misstepped by reinstalling the agents once they had "recovered" over to the second region. Should I have just done a "Start reverse replication"?

DaveH
已回答 2 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南