Questo contenuto non è disponibile nella lingua selezionata
Lavoriamo costantemente per rendere disponibili i contenuti nella lingua selezionata. Ti ringraziamo per la comprensione.
How do I troubleshoot an Amazon EMR Serverless job that doesn't start, runs slowly, or is stuck in the PENDING state?
2 minuti di lettura
1
I want to troubleshoot an Amazon EMR Serverless job that doesn’t start, runs slowly, or is stuck in the PENDING state.
Short description
A cold start occurs when resources take time to provision because a system is idle or scales from a low base.
If you submit a job and the workers from initialCapacity are available, then the job uses the initialCapacity resources to run. If another job is using the initialCapacity resources, then the Amazon EMR Serverless application requests additional workers up to the maximum quota.
Resolution
To troubleshoot an Amazon EMR Serverless job that doesn't start, runs slowly, or is stuck in the PENDING state, take the following actions:
To keep your drivers and workers ready to quickly respond and immediately start your application, use pre-initialized capacity.
Set up an appropriate initialCapacity for Hive and Spark.
To avoid idle resources, align your container sizes with your pre-initialized capacity worker sizes. For example, make sure that your Spark run sizes are the same as your pre-initialized capacity worker sizes.
Modify or turn off the auto-stop configuration feature. The default value is set to 15 minutes.
To prevent bottlenecks when your job immediately needs high concurrency, don't use spark.executor.instances that are set to 1.
To improve your job performance, increase the spark.dynamicAllocation.minExecutors when spark.dynamicAllocation.enabled is true. Also, increase the value of spark.executor.instances when spark.dynamicAllocation.enabled is false.