All Content tagged with Amazon EMR

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Content language: English

Select tags to filter
Sort by most recent
449 results
We have an lambda function that creates a cluster on demand , the cluster logs goes by default to S3 bucket specified in loguri, now we have a requirement to get application logs in cloudwatch wheneve...
1
answers
0
votes
39
views
asked 6 days ago
I have 2. 1. Issue Summary My EMR cluster fails with errors indicating "data source not found" in logs. Cluster apps (Spark, Hive, Livy) seem unable to locate input data, but the exact cause is uncl...
3
answers
0
votes
64
views
asked 14 days ago
Is it possible to reuse a single Load Balancer when deploying multiple interactive endpoints that will be associated with different users using EMR Studio? Implementation is on Amazon EMR on EKS. Refe...
1
answers
0
votes
49
views
profile pictureAWS
asked 20 days ago
Hello, I am attempting to write to AWS Neptune using Neo4j Connector for Spark, as stated in the [compatibility document](https://docs.aws.amazon.com/neptune/latest/userguide/migration-compatibility....
1
answers
0
votes
69
views
asked 24 days ago
In my account, I have two Glue Catalogs (one is the default catalog, AWSDataCatalog, and another catalog is shared from a different account). How can I access the databases in both catalogs from EMR E...
2
answers
0
votes
89
views
asked a month ago
Hi, I have been looking into a solution option that uses the Athena invoker_principal to get the ARN of the IAM role being used into the SQL query. Is there a way to do the same if EMR or Redshift...
1
answers
0
votes
58
views
asked a month ago
We're currently running EMR clusters with release version 6.10.0 where instances are patched using SSM "AWS-RunPatchBaseline" during bootstrap. We're experiencing several critical issues: cluster fail...
1
answers
0
votes
40
views
AWS
asked 2 months ago
How much should be approx time taken for EMR batch processing and storing data in Redshift for 1 TB data with simple transformation. I have following characteristics for data * File size varies from...
1
answers
0
votes
70
views
asked 2 months ago
I have a use case with * 60 MB/sec data volume * Near real time use cases of AI/Data science as downstream applications should be supported * It's not a ultra-low latency use case, even 60 seconds of...
1
answers
0
votes
66
views
asked 2 months ago
After upgrading EMR from 6.5 to 7.5 I am getting following error OpensslCipher: Failed to load OpenSSL Cipher.java.lang.UnsatisfiedLinkError: EVP_CIPHER_CTX_block_sizeBased on the HADOOP-18994 Failed...
0
answers
0
votes
43
views
asked 2 months ago
I would like to confirm whether it is possible to configure an Amazon EMR cluster with mixed instance types, combining both Graviton-based and non-Graviton instances within the same cluster. I'm going...
2
answers
0
votes
78
views
asked 2 months ago
I'm trying to run an EMR notebook to create a delta table in S3. EMR Cluster Version: emr-7.7.0 Installed Applications: Hadoop 3.4.0, Hive 3.1.3, JupyterEnterpriseGateway 2.6.0, Livy 0.8.0, Spark 3.5...
0
answers
0
votes
16
views
asked 2 months ago