Choose right tool for start Spark application

0

Hello. Could you advice what instruments is more suitable for migrate a Spark app to AWS? : The Spark applicaiton is for change data only, without any UI. We started it time-to-time, may be several times in a week. It touchs various amount of data from Hadoop, different in each start. The application must be start by various people from our team.

  • inputs are: Hadoop/Kafka/Tables in HDFS,
  • outputs are: Hadoop/Kafka and some in Clickhouse, but not very hard. now, We are migrating from Hadoop to S3. Kafka and Clickhouse are stay in onprem without any changes.

I heard about various tools in your side for work with Spark. Ex: Amazon Athena, Amazon Athena for Apache Spark, EMR, ERM-serverless, EC2 (something else?). Could you advice what from them is more suitable for our case?

thank you.

已提问 1 年前283 查看次数
1 回答
4
已接受的回答

I would recommend using Amazon EMR to run your Spark applications. Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS. It's designed for data processing tasks and is a good fit for your use case.\

ERM Advantages

  • EMR can scale your cluster up or down depending on your data processing needs. It also integrates well with Amazon S3, which can be used as a data lake to store your input and output data.
  • EMR supports running Spark applications written in various programming languages such as Scala, Python, and Java. It also provides integration with Apache Kafka and other AWS services.
  • You can use EC2 Spot Instances to save on costs when running your EMR clusters. Additionally, EMR has an auto-termination feature that automatically terminates idle clusters to save costs.
  • EMR integrates with AWS Identity and Access Management (IAM), allowing you to control access to your Spark applications and data.
profile picture
专家
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则