Questions tagged with Amazon EMR

Content language: English

Select up to 5 tags to filter

Sort by most recent

Filter Questions by

AllAnsweredUnansweredNo Answer

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Get the Last Execution Code block time on EMR notebook/workspace

I have an EMR workspace under which I have 4 Jupyter notebooks created on which PySpark code blocks are run. I want to get the last execution code block time across all 4 notebooks to determine the...

Accepted AnswerAmazon EMR Amazon EMR Studio

answers

votes

125

views

Sukrit

asked 15 days ago

How can I change default s3 storage class of Hive connector of EMR Trino?

I want to change the default s3 storage class to INTELLIGENT_TIERING of Hive connector of EMR Trino 426 (EMR 6.15.0). I found the [hive.s3.storage-class option in the Trino 426 official...

Accepted AnswerAmazon EMR

answers

votes

153

views

rePost-User-3418860

asked 20 days ago

Unable to load data to apache in EMR cluster notebook

I am running an EMR cluster with an attached notebook, and using Apache spark to load/process data however I have not been able to load data into Apache. Whenever I try to run...

Analytics Amazon EMR Extract Transform & Load Data Amazon EMR Studio

answers

votes

322

views

Music Dev

asked 23 days ago

Spark application takes longer than expected in emr 7

I have spark application running in emr 7 that took 15+ hours which was taken 9 hours in emr 6.14. There is no code change and data volume changes. One observation is the application attempted thrice...

Accepted AnswerAmazon EMR

answers

votes

257

views

Vaas

asked a month ago

How should i configure my emr cluster to handle large data

I have an EMR cluster and I have used the treasure data connector to read data from table into dataframe using pyspark. Now these tables that I'm trying to read have approximately 100 million to 500...

Amazon EMR

answers

votes

310

views

Nakshtra

asked a month ago

EMR Jupyter Notebook: PySpark Imports Work in Shell, Not in Notebook- Issue is importing custom files

Issue: PySpark works in the first cells (likely SparkSession creation) but throws import errors when using my Python files in later cells. Environment: AWS EMR ( Amazon EMR...

Amazon EMR

answers

votes

316

views

Harish

asked a month ago

Studio Workspace can't see my runnning EMR EC2 cluster to attach to

Let me know if this is something AWS EMR Studio does: 1. in Databricks community edition, and in Google Collab, one can fire up a simple Jupyter notrebook with an automatically started cluster (small...

Amazon WorkSpaces Amazon EMR Amazon EMR Serverless

answers

votes

382

views

ken cottrell

asked a month ago

AWS EMR - YARN Resource Issue

Hi everyone, I am using AWS EMR to do some ETL operations on very large datasets (like millions/billions of records). I am using PySpark and reading the csv files using *spark.read.csv*. The results...

Amazon EMR Compute

answers

votes

414

views

vsk95

asked a month ago

Serverless job failure

While running the serverless job run, I am getting below errror: "Number of cores specified by 'spark.driver.cores '7' is invalid".

Amazon EMR Amazon EMR Serverless

answers

votes

441

views

Akash

asked 2 months ago

refresh_hfiles not working

Hi I have a EMR with Hbase on S3 storage mode.I have a read replica cluster pointing to same S3 bucket. Now when I add record in primary cluster and flush table on primary, and then run refresh_hfiles...

Amazon EMR Database AWS IAM Identity Center Amazon S3 Access Grants

answers

votes

432

views

shushant

asked 2 months ago

AWS EMR WAL creation error

Hi I am getting error while launching EMR with Hbase as S3Storage and WAL backup enabled . Caused by: java.lang.RuntimeException: createWal failed for wal WALMetadata(WALWorkspace=testworkspace2,...

AWS Identity and Access Management Developer Tools Amazon EMR IAM Policies

answers

votes

562

views

shushant

asked 2 months ago

I have a Python package saved in CodeCommit and I need it to run in the notebook linked to an EMR cluster.

I have a Python package saved in CodeCommit and need to use it in the notebook attached to my EMR cluster workspace. The package is already successfully installed via bootstrap. To do this, in my .sh...

AWS CodeCommit Amazon EC2 Amazon EMR Amazon EMR Studio

answers

votes

456

views

amanda_oliveira

asked 2 months ago

1
2
3
4
5
•••
26
12 / page