Help us improve the AWS re:Post Knowledge Center by sharing your feedback in a brief survey. Your input can influence how we create and update our content to better support your AWS journey.
How do I resolve the error "Container killed by YARN for exceeding memory limits" in Spark on Amazon EMR?
I want to troubleshoot the error "Container killed by YARN for exceeding memory limits" in Spark on Amazon EMR.
Resolution
The root cause and the appropriate solution for this error depends on your workload. To troubleshoot the error, use the following troubleshooting methods, in the following order.
Increase memory overhead
Memory overhead is the amount of off-heap memory allocated to each executor. By default, Spark sets the memory overhead to either 10% of executor memory or 384, whichever is higher. Java NIO direct buffers, thread stacks, shared native libraries, and memory mapped files use this memory overhead.
Make gradual increases in memory overhead, up to 25%. The sum of the driver or executor memory and the memory overhead must be less than the yarn.nodemanager.resource.memory-mb for your instance type:
"spark.driver/executor.memory + spark.driver/executor.memoryOverhead < yarn.nodemanager.resource.memory-mb"
If the error occurs in the driver container or executor container, then increase memory overhead only for the container where the error happened. You can increase memory overhead on a running cluster, a new cluster, or when you submit a job.
Running cluster
Modify spark-defaults.conf on the primary node.
For example:
sudo vim /etc/spark/conf/spark-defaults.conf spark.driver.memoryOverhead 512 spark.executor.memoryOverhead 512
New cluster
Add a configuration object similar to the following example when you launch the cluster:
[ { "Classification": "spark-defaults", "Properties": { "spark.driver.memoryOverhead": "512", "spark.executor.memoryOverhead": "512" } } ]
Single job
To increase memory overhead when you run spark-submit, use the --conf option.
Example:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --conf spark.driver.memoryOverhead=512 --conf spark.executor.memoryOverhead=512 /usr/lib/spark/examples/jars/spark-examples.jar 100
If you continue to receive the error after you increase the memory overhead, then reduce the number of executor cores.
Reduce the number of executor cores
Note: Reverse any changes that you made to spark-defaults.conf in the preceding section.
When you reduce the number of executor cores, you reduce the maximum number of tasks that the executor can perform, which reduces the amount of memory required. Depending on the driver container that throws the error or the other executor container that gets the error, decrease the number of cores for the driver or the executor.
Running cluster
Modify spark-defaults.conf on the primary node.
Example:
sudo vim /etc/spark/conf/spark-defaults.confspark.driver.cores 3 spark.executor.cores 3
New cluster
Add a configuration object similar to the following example when you launch the cluster:
[ { "Classification": "spark-defaults", "Properties": {"spark.driver.cores" : "3", "spark.executor.cores": "3" } } ]
Single job
To reduce the number of executor cores when you run spark-submit, use the —executor-cores option.
Example:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-cores 3 --driver-cores 3 /usr/lib/spark/examples/jars/spark-examples.jar 100
If you continue to receive the error message, then increase the number of partitions.
Increase the number of partitions
Note: Reverse any changes that you made to spark-defaults.conf in the preceding section.
To increase the number of partitions, increase the value of spark.default.parallelism for raw resilient distributed datasets, or run a .repartition() operation.
When you increase the number of partitions, you reduce the amount of memory required per partition. Spark heavily uses cluster RAM as an effective way to maximize speed. Therefore, you must monitor memory usage with Ganglia (for Amazon EMR version 6.15) or Amazon CloudWatch agent (for Amazon EMR 7.0 and later). Then verify that your cluster settings and partitioning strategy meet your growing data needs. If you continue to get the "Container killed by YARN for exceeding memory limits" error message, then increase the driver and executor memory.
Increase driver and executor memory
Note: Reverse any changes that you made to spark-defaults.conf in the preceding section.
If the error occurs in either a driver container or an executor container, then increase memory for either the driver or the executor, but not both. Make sure that the sum of driver or executor memory plus driver or executor memory overhead is always less than the value of yarn.nodemanager.resource.memory-mb for your EC2 instance type:
"spark.driver/executor.memory + spark.driver/executor.memoryOverhead < yarn.nodemanager.resource.memory-mb"
Running cluster
Modify spark-defaults.conf on the primary node.
Example:
sudo vim /etc/spark/conf/spark-defaults.conf spark.executor.memory 1g spark.driver.memory 1g
New cluster
Add a configuration object similar to the following when you launch the cluster:
[ { "Classification": "spark-defaults", "Properties": { "spark.executor.memory": "1g", "spark.driver.memory":"1g", } } ]
Single job
Use the --executor-memory and --driver-memory options to increase memory when you run spark-submit.
Example:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 1g --driver-memory 1g /usr/lib/spark/examples/jars/spark-examples.jar 100
Other troubleshooting steps:
If you continue to receive the error message, take the following actions:
- Run your application against a sample dataset. Benchmarking is a best practice and it can help you spot slowdowns and skewed partitions that lead to memory problems.
- Process the minimum amount of required data. If you don't filter your data, or if you filter late in the application run, then excess data might slow down the application. This can increase the chance of a memory exception.
- Partition your data to ingest only the required data.
- Use a different partitioning strategy. For example, partition on an alternate key to avoid large partitions and skewed partitions.
- Your EC2 instance might not have the memory resources required for your workload. Switch to a larger memory-optimized instance type. If you continue to get memory exceptions after changing instance types, try the troubleshooting methods on the new instance.
Related information
Spark configuration on the Apache Spark website
How do I resolve the "java.lang.ClassNotFoundException" error in Spark on Amazon EMR?
- Topics
- Analytics
- Tags
- Amazon EMR
- Language
- English

Relevant content
- asked a year ago
AWS OFFICIALUpdated 2 months ago