Hadoop across multiple regions

0

I want to set up a hadoop cluster across multiple regions. However, as one may have already noticed, the instance can not bind to the elastic ip. As a result, the communications between the namenode in zone1 and the datanode in zone2 can not be initiated by start-dfs.sh, reporting a socket binding error.

In my deployment, I set the "fs.defaultFS" in core-site.xml to "hdfs://ec2xxxx.amazonaws.com(public DNS):8020" and give access to all inbound/outbound traffics, but still fail to run the cluster. Could anyone please give me some advice? Thanks a lot!

sym0990
질문됨 5년 전408회 조회
1개 답변
0

Hi, Hadoop across multiple regions is NOT supported by AWS. In fact, AWS recommends all resources in the cluster should be in the same Availability Zone to optimize performance.
Link: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-region.html

Also, here are some additional statements from Hortonworks and Cloudera on Hadoop cluster across multiple regions..
Link: https://community.hortonworks.com/questions/224369/do-you-support-spanning-a-hadoop-cluster-over-mult.html
Hortonworks -While it is possible to span a cluster across multiple Availability Zones, we don’t generally recommend spanning availability zones for the following reasons:

  1. In AWS, there is a natural latency involved with data moving across multiple AZs which will lead to performance problems or other issues in the cluster. Specially as the cluster size and workload increases the performance issues are more pronounced.

  2. Double billing: As a natural part of cluster functioning, there will be data transfer. According to AWS FAQs - Each instance is charged for its data in and data out at corresponding Data Transfer rates. Therefore, if data is transferred between these two instances, it is charged at "Data Transfer Out from EC2 to Another AWS Region" for the first instance and at "Data Transfer In from Another AWS Region" for the second instance

Link: https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_aws.pdf
Cloudera EDH deployments are restricted to single regions. Single clusters spanning regions are not supported.

Hope this helps!
-randy

답변함 5년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠