How does EMR on EKS deployment model support EMR auto-scaling/managed scaling?


EMR release 5.30 introduced managed scaling features which automatically scales the cluster based on Spark/Hive/etc workload. How does managed scaling work when EMR is deployed on EKS ? Or, how is the same feature-capability supported when EMR virtual cluster (running on EKS cluster) needs more compute/worker resources to process larger workload within the same SLA?

asked 3 years ago882 views
1 Answer
Accepted Answer

The EKS cluster node scaling is entirely on the EKS cluster. EKS uses Kubernetes auto scaler to scale-out/in of the specific node group. EMR requests the Kubernetes scheduler on EKS to schedule Pods. For each job that you run, EMR on EKS creates a container. The container contains Amazon Linux 2 base image with security updates, plus Apache Spark and associated dependencies to run Spark, plus your application-specific dependencies. Each Job runs in a pod. The Pod downloads this container and starts to execute it. The Pod terminates after the job terminates.

In terms of processing larger workload, EMR on EKS allows Multi-AZ Support for jobs. (which is not present for EMR on EC2). Hence Spark job can launch executors container on nodes spanning multi-AZ. To enable Kubernetes auto scaler on EKS cluster, follow the instructions in this document .

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions