Questions tagged with Amazon EMR
Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
316 results
We use AWS EMR 7.2.0 on EC2 with instance fleets (only Primary, Core, no spot instances) and managed scaling for long term use (weeks). On each of the 3 cluster we started so far, we observed the...
I am using EMR 711395599931.dkr.ecr.us-east-2.amazonaws.com/spark/emr-6.14.0:latest from SparkSubmitOperator and passing this to jar
where I am executing a User define function (UDF) in spark.
I am...
Hi.
I have set up an EMR Serverless application and i am using my custom image. I've configured everything properly.
Next, I've created a custom image according to the official...
I launched an EMR cluster from a CloudFormation template stored as a Service Catalog template **from SageMaker**. In the template, KeepJobFlowAliveWhenNoSteps was not specified in...
Hi everyone,
I'm having trouble connecting to my MySQL RDS instance from an EMR cluster, even though both are in the same VPC and port 3306 is open in the security group. Here’s the setup:
RDS...
Hello Community,
I’m trying to run Apache Superset on an EMR cluster and I’m facing an issue with accessing the Superset web interface through SSH tunneling. Here’s a summary of my setup and the...
Hello
As part of Cloud Migration and Modernization approach using using AWS, the requirement is to migrate Hbase data directly to S3 then read the data from S3 using Java Microservices. (EMR would not...
I have a use case where I need to run Batch EMR job on schedule (daily). I can make folders on date basis for my data coming from IoT. Or I can make folders for each device sending IoT data and put...
Trying to load data of 200GB into dynamo using spark EMR but facing performance issues.
"""
Copy paste the following code in your Lambda function. Make sure to change the following key parameters for...
I'm trying to create a EMR 7.1.0 cluster with HBase enabled for full S3 backup (including WAL) via the web console. However, no AWSServiceRoleForEMRWAL role is automatically being created and thus my...
I'm trying to find out if Trino on EMR supports access controls maintained in Lake Formation. My catalog is AWS Glue. I couldn't find any documentation on Lake Formation or EMR side that would talk...
Hello,
Can we get solution for this error `Service: EmrServerlessResourceManager; Status Code: 403; Error Code: AccessDeniedException` while running spark submit jobs at EMR Serverless.
Below is...