EMR Serverless consume different amounts of resources in the different regions for the same data size

Question

We are using EMR serverless to process the data, but we find for the same data size, EMR serverless consume different amounts of resource, such as vCPU-hours, memoryGB-hours and storageGB-hours.

I testing in us-east-1, us-east-2 and ap-northeast-1, I found the cheapest one is us-east-2, and then us-east-1 and ap-northeast-1.  The difference between the two regions is more than 20%.

What's the reason of the difference?

Thanks

Answer

Hello,

EMR Serverless automatically scales workers up or down based on the workload and parallelism required at every stage of the job. The worker container's resource requirement is availed based on the availability resource pool which might varies depends on the instance type tenancy. I request you to compare the application logs for an instance (tez, spark container logs to see if any stages skipped or reducer optimized the steps etc).

EMR Serverless consume different amounts of resources in the different regions for the same data size

Contenuto pertinente