Questions tagged with Amazon EMR

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Content language: English

Select up to 5 tags to filter
Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

316 results
We use AWS EMR 7.2.0 on EC2 with instance fleets (only Primary, Core, no spot instances) and managed scaling for long term use (weeks). On each of the 3 cluster we started so far, we observed the...
2
answers
0
votes
47
views
asked 7 days ago
I am using EMR 711395599931.dkr.ecr.us-east-2.amazonaws.com/spark/emr-6.14.0:latest from SparkSubmitOperator and passing this to jar where I am executing a User define function (UDF) in spark. I am...
2
answers
0
votes
55
views
asked 8 days ago
Hi. I have set up an EMR Serverless application and i am using my custom image. I've configured everything properly. Next, I've created a custom image according to the official...
2
answers
0
votes
100
views
asked a month ago
I launched an EMR cluster from a CloudFormation template stored as a Service Catalog template **from SageMaker**. In the template, KeepJobFlowAliveWhenNoSteps was not specified in...
0
answers
0
votes
79
views
asked a month ago
Hi everyone, I'm having trouble connecting to my MySQL RDS instance from an EMR cluster, even though both are in the same VPC and port 3306 is open in the security group. Here’s the setup: RDS...
0
answers
0
votes
33
views
Sachin
asked 2 months ago
Hello Community, I’m trying to run Apache Superset on an EMR cluster and I’m facing an issue with accessing the Superset web interface through SSH tunneling. Here’s a summary of my setup and the...
0
answers
0
votes
30
views
Sachin
asked 2 months ago
Hello As part of Cloud Migration and Modernization approach using using AWS, the requirement is to migrate Hbase data directly to S3 then read the data from S3 using Java Microservices. (EMR would not...
1
answers
0
votes
429
views
bnaha
asked 2 months ago
I have a use case where I need to run Batch EMR job on schedule (daily). I can make folders on date basis for my data coming from IoT. Or I can make folders for each device sending IoT data and put...
1
answers
0
votes
411
views
Kumar77
asked 3 months ago
Trying to load data of 200GB into dynamo using spark EMR but facing performance issues. """ Copy paste the following code in your Lambda function. Make sure to change the following key parameters for...
4
answers
0
votes
757
views
asked 3 months ago
I'm trying to create a EMR 7.1.0 cluster with HBase enabled for full S3 backup (including WAL) via the web console. However, no AWSServiceRoleForEMRWAL role is automatically being created and thus my...
2
answers
0
votes
392
views
asked 3 months ago
I'm trying to find out if Trino on EMR supports access controls maintained in Lake Formation. My catalog is AWS Glue. I couldn't find any documentation on Lake Formation or EMR side that would talk...
1
answers
0
votes
496
views
profile picture
Saawgr
asked 3 months ago
Hello, Can we get solution for this error `Service: EmrServerlessResourceManager; Status Code: 403; Error Code: AccessDeniedException` while running spark submit jobs at EMR Serverless. Below is...
1
answers
0
votes
700
views
Ashwath
asked 4 months ago