How do I reduce costs in Amazon EMR Serverless?

2 minute read
2

I want to reduce costs in Amazon EMR Serverless.

Short description

Amazon EMR Serverless charges you for aggregate vCPU, memory, and storage resources that are used between the period that workers are ready to run and until the workers stop. This time period is rounded up to the nearest second with a one minute minimum.

Resolution

To reduce costs in Amazon EMR Serverless, take the following actions:

Use Graviton on Amazon EMR Serverless

Amazon EMR Serverless provides x86_64 and arm64 architecture options for your applications. If you use AWS Graviton2 processors on EMR Serverless, then your costs might significantly decrease. For more information, see Using arm64 architecture (Graviton).

Avoid significant scaling in EMR Serverless

When you use Spark dynamic resource allocation, you might experience excessive assignment of new workers. The assignment of new workers prioritizes performance and as a result, costs are increased. To prevent this, adjust spark.dynamicAllocation.maxExecutor to set your maximum threshold.

Optimize your resource usage in Amazon EMR Serverless

To optimize your resource usage in Amazon EMR Serverless, set dynamicAllocationOptimization to true in your Spark configuration classification. For more information, see Spark dynamic resource allocation optimization.

To maximize your cost efficiency, use either the job level setting spark.dynamicAllocation.maxExecutors or the application level maximum capacity setting. Use these settings to configure and upper scaling bound on workers.

Manage your disk space in workers for large scale jobs

Large scale jobs might use large data volumes with significant shuffles. As a result, your large scale jobs might require increased disk usage in Amazon EMR Serverless. Configurations such as spark.executor.cores combined with spark.emr-serverless.executor.disk and spark.dynamicAllocation.maxExecutors allow control over the associated worker size and total attached storage. To improve your performance for I/O intensive workloads, use Shuffle optimized disks.

Optimize your infrastructure costs

When you use AWS Config, Amazon EMR Serverless creates an elastic network interface item record for every worker. To avoid costs related to this resource, turn off AWS::EC2::NetworkInterface in AWS Config. To avoid cross account Amazon S3 data transfers, use an Amazon VPC endpoint for Amazon S3 instead of a NAT gateway.

Related information

Amazon EMR Serverless cost estimator

Amazon EMR pricing

AWS Supports You | Reducing Costs with EMR Serverless Design Patterns on the YouTube website

Spark job parameters

AWS OFFICIAL
AWS OFFICIALUpdated a month ago