Difference between Amazon Glue and Amazon EMR



Please share the difference between AWS Glue and AWS EMR and which one we should use and when?


asked 6 months ago1379 views
1 Answer
Accepted Answer

Hi, AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. And Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.

AWS Glue Supporting Apache Spark and Amazon EMR serverless availability is what makes the overlapping between each other. Always remember that what you may recommend should depend on the user persona and use case.

From a recommendation point of view:

  • AWS Glue is our recommended service for Data Integration workloads and ETL from legacy platforms such as Informatica, Talend etc.
  • Amazon EMR is our recommended service for Big Data workloads that are traditionally run on Hadoop.

Use Amazon EMR:

  • Hadoop Migration from on-prem or other cloud providers, including Databricks migration
  • Customer has expertise beyond just Spark, for ex. Hive, Presto, Trino
  • Customer is skilled in loading their own data source connector libraries for their jobs.

Use AWS Glue:

  • Customer prefers built-in capabilities: connectors, transformations, incremental load, job monitoring, orchestration.
  • Customer wants visual and code ETL development tools
  • Migration from ETL providers such as Informatica, Talend, Matillion
profile pictureAWS
answered 6 months ago
reviewed a month ago
profile picture
reviewed 2 months ago
  • Thank you!!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions