Explore how you can quickly prepare for, respond to, and recover from security events. Learn more.
Questions tagged with Amazon EMR
Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
317 results
Trying to load data of 200GB into dynamo using spark EMR but facing performance issues.
"""
Copy paste the following code in your Lambda function. Make sure to change the following key parameters for...
I'm trying to create a EMR 7.1.0 cluster with HBase enabled for full S3 backup (including WAL) via the web console. However, no AWSServiceRoleForEMRWAL role is automatically being created and thus my ...
I'm trying to find out if Trino on EMR supports access controls maintained in Lake Formation. My catalog is AWS Glue. I couldn't find any documentation on Lake Formation or EMR side that would talk ab...
Hello,
Can we get solution for this error `Service: EmrServerlessResourceManager; Status Code: 403; Error Code: AccessDeniedException` while running spark submit jobs at EMR Serverless.
Below is th...
I noticed that when you create a new EMR cluster using Spark, the default Python environment includes two different packages that both provide the "dateutil" package:
```
py-dateutil==2.2
python-date...
Hello Experts,
Technically speaking, EBS volumes assigned to the EMR core nodes are persistent storage and I have specifically created them to not delete on cluster termination. Then, I have attached...
I know the recommended strategy is to use EMR Serverless or EMR. However, I have a particular use case where I only need to run a fairly small PySpark job and need quick results. I've already gotten...
Why does Amazon EMR creates inbound rule entries for master and core security groups?
![Core SG](/media/postImages/original/IM6Mggxg_vTQSTJFNCM0FRPA)
![Master SG](/media/postImages/original/IMqAmPsK...
I have an EMR workspace under which I have 4 Jupyter notebooks created on which PySpark code blocks are run.
I want to get the last execution code block time across all 4 notebooks to determine the ti...
I want to change the default s3 storage class to INTELLIGENT_TIERING of Hive connector of EMR Trino 426 (EMR 6.15.0).
I found the [hive.s3.storage-class option in the Trino 426 official manual](https:...
I am running an EMR cluster with an attached notebook, and using Apache spark to load/process data however I have not been able to load data into Apache. Whenever I try to run spark.read.csv('s3://buc...
I have spark application running in emr 7 that took 15+ hours which was taken 9 hours in emr 6.14. There is no code change and data volume changes. One observation is the application attempted thrice ...