How do I transfer data between Amazon MSK clusters in different accounts with MirrorMaker 2 that's running on MSK Connect?

4 minute read
0

I want to use MirrorMaker 2.0 (MM2) that runs on MSK Connect to transfer data between Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters in different accounts.

Resolution

Set up VPC peering

Because the Amazon MSK clusters are in different Virtual Private Clouds (VPCs), you must create a VPC peering connection. Create this VPC peering connection with a VPC in another AWS account in the same AWS Region or a different Region. For more information, see Create a VPC peering connection.

The security groups that are associated with the source Amazon MSK cluster must allow all traffic from the security groups of the target cluster. The target cluster's security groups must also allow all traffic from the MSK cluster's security groups. For more information, see Update your security groups to reference peer security groups.

Note: To reference a security group in another account, include the account number in the Source or Destination field. Example: 123456789012/sg-1a2b3c4d

Create a plugin with MM2 plugin information

MSK Connect custom plugins accept a file or folder with a .jar or .zip ending.

Complete the following steps:

1.    Create a dummy folder or file, and then compress it:

mkdir mm2   
zip mm2.zip mm2 

2.    Upload the .zip object to your Amazon Simple Storage Service (Amazon S3) bucket in the target account:

aws s3 cp mm2.zip s3://mytestbucket/

Because Apache Kafka and MSK Connect have MirrorMaker libraries built in, you don’t need to add additional .jar files for this functionality. MSK Connect has a prerequisite that a custom plugin must be present at connector creation. Therefore, you must create an empty one for reference.

3.    In the target account, use the .zip file to create a custom plugin. Use mm2-connect-plugin as the name for the custom plugin.

Create an MSK Connect Connector

Complete the following steps to create a connector in the target account:

1.    Open the Amazon MSK console.

2.    In the navigation pane, under MSK Connect, choose Connectors

3.    Choose Create connector.

4.    In the list of custom plugins, select the one that's next to the custom plugin that you created, and then choose Next.

5.    Enter a name for the connector, and optionally, a description.

6.    From the list of clusters, choose the target cluster.

7.    Copy the following configuration, and then paste it into the connector configuration field. Modify the example for your use case.

connector.class=org.apache.kafka.connect.mirror.MirrorSourceConnector  
tasks.max=1  
  
clusters=primary,replica  
source.cluster.alias=primary  
target.cluster.alias=replica  
  
topics=example.*  
replication.factor=2  
topic.creation.default.replication.factor=2  
topic.creation.default.partitions=2  
consumer.group.id=mm2-connector  
  
refresh.groups.interval.seconds=20  
refresh.topics.interval.seconds=20  
  
sync.topic.configs.interval.seconds=20  
sync.topic.acls.interval.seconds=20  
  
producer.enable.idempotence=true  
  
transforms=renameTopic  
transforms.renameTopic.type=org.apache.kafka.connect.transforms.RegexRouter  
transforms.renameTopic.regex=primary.(.*)  
transforms.renameTopic.replacement=$1  
  
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter  
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter  
  
# Source cluster options  
source.cluster.bootstrap.servers=<Source_MSK_Bootstrap_Server_PLAINTEXT>  
source.cluster.security.protocol=PLAINTEXT  
  
# Destination cluster options  
target.cluster.bootstrap.servers=<Target_MSK_Bootstrap_Server_PLAINTEXT>  
target.cluster.security.protocol=PLAINTEXT

8.    Set the capacity for your connector.

9.    Under Worker configuration, choose Use the MSK default configuration.

10.    Under Access permissions, choose the AWS Identity and Access Management (IAM) role that provides the required permissions to MSK Connect. Then, choose Next.

11.    On the Security page, under Encryption - in transit, choose Plaintext traffic. Then, choose Next.

12.    Optionally, on the Logs page, set the log delivery. Then, choose Next.

13.    Under Review and create, choose Create connector.

Note: With this configuration, to replicate each topic from the source cluster, MM2 creates two topics in the target cluster. For example, if you have the topic exampleTopic1 on the source cluster, MM2 creates the topics primary.exampleTopic1 and exampleTopic1 on the target cluster. Messages are routed to the exampleTopic1 topic.

Create a client instance

You must create a client instance to create topics and produce or consume data from topics.

1.    Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance according to your requirements in either the source or target account. Then, connect to the instance.

2.     Run the following command to install Java on the client machine:

sudo yum -y install java-11

3.    Run the following command to download Apache Kafka:

wget https://archive.apache.org/dist/kafka/2.8.1/kafka_2.12-2.8.1.tgz  
  
tar -xzf kafka_2.12-2.8.1.tgz

4.    Create the topic exampletopic1 in the Amazon MSK cluster in the source account:

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --bootstrap-server <Source MSK cluster BootstrapServerString> --replication-factor 3 --partitions 1 --topic exampletopic1

5.    Produce data in the cluster on the source account:

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-list <Source MSK cluster BootstrapServerString> --topic exampletopic1
>message 1  
>message 2

6.    List the topics in the cluster on the target account:

<path-to-your-kafka-installation>/bin/kafka-topics.sh --bootstrap-server <Target MSK cluster BootstrapServerString> --list

The output must look similar to the following one:

__amazon_msk_canary  
__amazon_msk_connect_configs_mm2-*****  
__amazon_msk_connect_offsets_mm2-*****  
__amazon_msk_connect_status_mm2-*****  
__consumer_offsets  
exampleTopic1  
primary.exampleTopic1

7.    Consume data from the target cluster:

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-server <Target MSK cluster BootstrapServerString> --topic exampletopic1 --from-beginning
message 1  
message 2

Related information

Migrating clusters using Apache Kafka's MirrorMaker

AWS OFFICIAL
AWS OFFICIALUpdated a year ago