I want to reduce costs in Amazon EMR Serverless.
Short description
Amazon EMR Serverless charges you for aggregate vCPU, memory, and storage resources that are used between the period that workers are ready to run and until the workers stop. This time period is rounded up to the nearest second with a one minute minimum.
Resolution
To reduce costs in Amazon EMR Serverless, take the following actions:
Use Graviton on Amazon EMR Serverless
Amazon EMR Serverless provides x86_64 and arm64 architecture options for your applications. If you use AWS Graviton2 processors on EMR Serverless, then your costs might significantly decrease. For more information, see Using arm64 architecture (Graviton).
Avoid significant scaling in EMR Serverless
When you use Spark dynamic resource allocation, you might experience excessive assignment of new workers. The assignment of new workers prioritizes performance and as a result, costs are increased. To prevent this, adjust spark.dynamicAllocation.maxExecutor to set your maximum threshold.
Optimize your resource usage in Amazon EMR Serverless
To optimize your resource usage in Amazon EMR Serverless, set dynamicAllocationOptimization to true in your Spark configuration classification. For more information, see Spark dynamic resource allocation optimization.
To maximize your cost efficiency, use either the job level setting spark.dynamicAllocation.maxExecutors or the application level maximum capacity setting. Use these settings to configure and upper scaling bound on workers.
Manage your disk space in workers for large scale jobs
Large scale jobs might use large data volumes with significant shuffles. As a result, your large scale jobs might require increased disk usage in Amazon EMR Serverless. Configurations such as spark.executor.cores combined with spark.emr-serverless.executor.disk and spark.dynamicAllocation.maxExecutors allow control over the associated worker size and total attached storage. To improve your performance for I/O intensive workloads, use Shuffle optimized disks.
Optimize your infrastructure costs
When you use AWS Config, Amazon EMR Serverless creates an elastic network interface item record for every worker. To avoid costs related to this resource, turn off AWS::EC2::NetworkInterface in AWS Config. To avoid cross account Amazon S3 data transfers, use an Amazon VPC endpoint for Amazon S3 instead of a NAT gateway.
Related information
Amazon EMR Serverless cost estimator
Amazon EMR pricing
AWS Supports You | Reducing Costs with EMR Serverless Design Patterns on the YouTube website
Spark job parameters