Datasync HDFS to S3 on Azure VM


I have my HDFS setup on a Azure VM and datasync agent setup on a different Azure VM and enabled communication between two VMS. My ultimate goal is to transfer data from HDFS to AWS s3. I have configured and activated the datasync agent and have connected the datasync agent using AWS public end points . I have tested network connectivity to public end points and self managed storage i.e. HDFS here. The connectivity showed PASSED for both. But when I create a task using the activated agent and keeping the source as HDFS and S3, it is just throwing me error as " input/output error cannot read source file ". Can you please let me know how I can fix this issue.

asked 2 years ago357 views
1 Answer

Input/Output errors almost always indicate a network connection problem between the DataSync agent and HDFS NameNode or HDFS DataNodes. You can check this here:

You need to validate the namenode metadata service port (IPC port) set on the core-site.xml under the following property: fs.defaultFS or, depending upon your Hadoop distribution.

$ grep -A 2 -B 1 -i fs.default core-site.xml 

You must also validate the datanode port setting under dfs.datanode.address property.

$ grep -A 2 -B 1 -i dfs.datanode.address hdfs-site.xml

Are you using Simple Authentication or Kerberos Authentication?

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions