Skip to content

Questions tagged with Amazon EMR

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Content language: English

Filter questions
Select tags to filter
Sort by
Sort by most recent
Filter Questions by:

Browse through the questions and answers listed below or filter and sort to narrow down your results.

344 results
Hello, I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and...
1
answers
0
votes
124
views
asked 2 months ago
Hi Team, I'm trying to set up the Amazon Q on the EMR studio notebook workspace and followed this guide: https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/emr-setup.html?trk=769a1a2b-8c19-4976...
1
answers
0
votes
59
views
asked 3 months ago
Getting this issue in Amazon EMR during a pyspark job execution. ``` df = spark.read.parquet("s3a://test/raw-billing-cor-data/cur2/123456789/cid-cur2/data/BILLING_PERIOD=2025-08/") py4j.protocol.Py4...
1
answers
0
votes
73
views
asked 3 months ago
Hi. I am trying to configure ZGC in HBase following the recommendations, but the JAVA_HOME and HBASE_REGIONSERVER_GC_OPTS variables are not modified in the /etc/hbase/conf/hbase-env.sh file. Has anyo...
1
answers
0
votes
39
views
asked 3 months ago
I am deploying an EMR HBase cluster with EMR WAL enabled using Terraform. The cluster is created successfully and the WALs are visible using the emrwal CLI. When I change some configuration of my clus...
1
answers
0
votes
52
views
asked 4 months ago
Hi everyone, I currently have an EMR cluster (emr-6.9.0) running a real-time ingestion process. To save disk space, I’ve been using the **Cloud Shuffle Storage Plugin** for Apache Spark. Now, I need ...
1
answers
0
votes
163
views
asked 4 months ago
Hi , Recently i started facing issues with EMR (EC2 is out of capacity), mentioning that "EC2 is out of capacity for m6a.12xlarge in availability zone us-east-1c" I tried different machines in same ...
Accepted AnswerAmazon EC2Amazon EMR
1
answers
0
votes
114
views
asked 5 months ago
Hi Team, I'm Getting the below error while writing the table in postgres using glue spark script can you please help me on this issue. "Table or view "asset_aircraft_rpt" already exists. SaveMode: ...
1
answers
0
votes
83
views
asked 5 months ago
Hello, I'm facing a pretty annoying error. Whenever I try to execute a UDF function on a EMR Notebook I get the following error: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o157...
2
answers
0
votes
105
views
asked 6 months ago
I'm trying to build a simple Collaborative Filtering Recommendation Engine using Apache Spark ML lib on Amazon EMR. So I created a EMR on EC2 cluster, with the following configuration: ![Enter image...
1
answers
0
votes
63
views
asked 6 months ago
Hello! We're trying to migrate from a stand-alone Hive Metastore to Glue. We've modified the definition of some EMR clusters (v7.0.0) to use Glue as the metastore, we use Spark on Hadoop to process da...
2
answers
0
votes
157
views
asked 6 months ago
We're looking for native API/CLI/SDK support to manage EMR Studio workspaces programmatically. Currently, these operations (create, list, delete, etc.) are only possible via the UI, making it difficul...
1
answers
0
votes
130
views
asked 6 months ago
  • 1
  • 2
  • 3
  • 4
  • 5
  • •••
  • 29
  • Page size
    12 / page