Choose right tool for start Spark application

0

Hello. Could you advice what instruments is more suitable for migrate a Spark app to AWS? : The Spark applicaiton is for change data only, without any UI. We started it time-to-time, may be several times in a week. It touchs various amount of data from Hadoop, different in each start. The application must be start by various people from our team.

  • inputs are: Hadoop/Kafka/Tables in HDFS,
  • outputs are: Hadoop/Kafka and some in Clickhouse, but not very hard. now, We are migrating from Hadoop to S3. Kafka and Clickhouse are stay in onprem without any changes.

I heard about various tools in your side for work with Spark. Ex: Amazon Athena, Amazon Athena for Apache Spark, EMR, ERM-serverless, EC2 (something else?). Could you advice what from them is more suitable for our case?

thank you.

feita há um ano283 visualizações
1 Resposta
4
Resposta aceita

I would recommend using Amazon EMR to run your Spark applications. Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS. It's designed for data processing tasks and is a good fit for your use case.\

ERM Advantages

  • EMR can scale your cluster up or down depending on your data processing needs. It also integrates well with Amazon S3, which can be used as a data lake to store your input and output data.
  • EMR supports running Spark applications written in various programming languages such as Scala, Python, and Java. It also provides integration with Apache Kafka and other AWS services.
  • You can use EC2 Spot Instances to save on costs when running your EMR clusters. Additionally, EMR has an auto-termination feature that automatically terminates idle clusters to save costs.
  • EMR integrates with AWS Identity and Access Management (IAM), allowing you to control access to your Spark applications and data.
profile picture
ESPECIALISTA
respondido há um ano

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas