I can't run my Apache Spark application on my Amazon EMR notebook
Short description
Spark applications run from an EMR notebook might fail to start with the following exception:
The code failed because of a fatal error:
Session 4 did not start up in 60 seconds.
Resolution
The following are common troubleshooting steps for running Spark applications on your EMR notebook:
Check the resources on the cluster
Make sure that Spark has enough available resources in the cluster for Jupyter to create a Spark context. You can check available resources using Amazon CloudWatch metrics or the Resource Manager.
Make sure that the Sparkmagic libraries are configured correctly
Contact your Jupyter administrator to make sure that the Sparkmagic libraries are configured correctly.
Restart the notebook kernel
1. Open the EMR console and then select Notebook.
2. Select the notebook from the Notebooks list, and then choose Open in JupyterLab or Open in Jupyter. A new browser tab opens to the JupyterLab or Jupyter Notebook editor.
3. From the Kernel menu, select Restart Kernel.
Increase the Spark session timeout period for JupyterLab
To increase the Spark session timeout period, do the following:
1. Open the EMR console and select Notebook.
2. Select the notebook from the Notebooks list.
3. Access the EMR notebook's Jupyter web user interface.
4. Open the EMR notebook terminal.
5. Open the config.json file using the following command:
vi /home/notebook/.sparkmagic/config.json
5. Add or update the livy_session_startup_timeout_seconds: xxx option in the config.json file.
6. Restart all kernels.
Note: If the JupyterHub application is installed in the EMR primary instance, then do the following to increase the Spark session timeout period.
1. Run the following command:
vi /etc/jupyter/conf/config.json
2. Update the livy_session_startup_timeout_seconds:60 option to your value and then restart the JupyterHub container.
Tune Spark driver memory
Tune the Spark driver memory used by the Jupyter notebook application to control the resource allocation. For more information, see How can I modify the Spark configuration in an Amazon EMR notebook?
Make sure that the Apache Livy service is healthy
Check the status of the Livy server running on the primary node instance
1. Use the following command to check the status of the livy-server:
sudo systemctl status livy-server
2. Use the following command to restart the livy-server if the status is down:
sudo systemctl start livy-server
Increase the Livy Server memory
By default, the notebook client attempts to connect to Livy Server for 90 seconds. If the Livy server doesn't respond within 90 seconds, then the client generates a timeout. The most common reason why Livy server isn't responding is lack of enough resources. To fix this, increase the memory for Livy server:
1. Connect to the primary node instance using SSH.
2. Add the following property to the file /etc/livy/conf/livy-env.sh:
export LIVY_SERVER_JAVA_OPTS="-Xmx8g" (option to your value)
3. For the changes to take effect, restart the Livy server.
sudo systemctl stop livy-server
sudo systemctl start livy-server
Use cluster mode instead of client mode in Livy
Spark applications are submitted on the notebook in client mode and the Spark driver runs as the sub process of the Livy server. Running as a sub process might cause a lack of resources on the primary node. To prevent Livy from failing due to insufficient resources, change deployment mode to cluster mode. Running on cluster mode, the driver runs on the application primary in the core and task nodes rather than on the primary node.
To use cluster mode, do the following:
1. Connect to the primary node using SSH.
2. Add the following parameter to the file /etc/livy/conf/livy.conf:
livy.spark.deploy-mode cluster
3. For the changes to take effect, restart the Livy server:
sudo systemctl stop livy-server
sudo systemctl start livy-server