Questions tagged with Amazon EMR

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Content language: English

Select up to 5 tags to filter
Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

317 results
I am trying to launch a cluster using a JSON script, and we are able to launch it successfully. However, when I attempt to add an 'AutoScalingPolicy' as part of the same JSON file in the Step Function...
2
answers
0
votes
79
views
asked 11 days ago
Since upgrading from EMR 6.X to EMR 7.X, the spark history server produces three distinct Error 500s, the first being a basic JSON exception: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected...
Accepted AnswerAmazon EMR
1
answers
0
votes
70
views
asked 17 days ago
Hello, I am currently working with Spark and HDFS (on-premise) and have recently started learning AWS (covering various services such as S3, DynamoDB, RDS, Redshift, EMR, etc.). I have two main quest...
4
answers
0
votes
65
views
asked 22 days ago
I have a Trino on EMR setup, I need help on accessing Glue tables from the EMR. Athena can access those tables. Below is the error message after running trino cli `show tables;` command ``` dev-dsk...
1
answers
0
votes
22
views
asked a month ago
We use AWS EMR 7.2.0 on EC2 with instance fleets (only Primary, Core, no spot instances) and managed scaling for long term use (weeks). On each of the 3 cluster we started so far, we observed the foll...
2
answers
0
votes
134
views
asked 2 months ago
I am using EMR 711395599931.dkr.ecr.us-east-2.amazonaws.com/spark/emr-6.14.0:latest from SparkSubmitOperator and passing this to jar where I am executing a User define function (UDF) in spark. I am ge...
2
answers
0
votes
297
views
asked 2 months ago
Hi. I have set up an EMR Serverless application and i am using my custom image. I've configured everything properly. Next, I've created a custom image according to the official docs: https://docs.a...
2
answers
0
votes
148
views
asked 3 months ago
I launched an EMR cluster from a CloudFormation template stored as a Service Catalog template **from SageMaker**. In the template, KeepJobFlowAliveWhenNoSteps was not specified in JobFlowInstancesConf...
0
answers
0
votes
87
views
asked 3 months ago
Hi everyone, I'm having trouble connecting to my MySQL RDS instance from an EMR cluster, even though both are in the same VPC and port 3306 is open in the security group. Here’s the setup: RDS Datab...
0
answers
0
votes
44
views
asked 3 months ago
Hello Community, I’m trying to run Apache Superset on an EMR cluster and I’m facing an issue with accessing the Superset web interface through SSH tunneling. Here’s a summary of my setup and the issu...
0
answers
0
votes
36
views
asked 4 months ago
Hello As part of Cloud Migration and Modernization approach using using AWS, the requirement is to migrate Hbase data directly to S3 then read the data from S3 using Java Microservices. (EMR would not...
1
answers
0
votes
471
views
asked 4 months ago
I have a use case where I need to run Batch EMR job on schedule (daily). I can make folders on date basis for my data coming from IoT. Or I can make folders for each device sending IoT data and put da...
1
answers
0
votes
437
views
asked 5 months ago