Skip to content

How do I modify the Spark configuration in my Amazon EMR Studio Workspace?

2 minute read
0

I want to modify the Spark configuration in my Amazon EMR Studio Workspace.

Short description

To customize your Spark configuration for Amazon Elastic Compute Cloud (Amazon EC2) or EMR Serverless remote clusters, use Sparkmagic commands to change executorMemory, executorCores, and resource allocation.

For Amazon Elastic Kubernetes Service (Amazon EKS) remote clusters, set the Spark configuration in the SparkContext object. Amazon EKS doesn't support Sparkmagic commands. For more information, see Considerations and limitations.

Resolution

Customize the Spark configuration for Amazon EC2 or EMR Serverless remote clusters

Complete the following steps:

  1. To modify the job configuration, run the %%configure command in the Workspace cell.

    Example:

    %%configure -f
    {"executorMemory":"4G"}

    Note: In the preceding example, executorMemory is modified for the Spark job.

  2. For additional configurations that you pass with the -conf option, use a nested JSON object.
    Note: To pass an object to a SparkContext or Sparksession, use -conf instead of conf. For more information, see Spark configuration on the Apache Livy website.

  3. Example:

    %%configure -f
    {
      "driverMemory": "4G",
      "driverCores": 4,
      "executorMemory" : "8G",
      "executorCores": 2,
      "conf": {
         "spark.dynamicAllocation.maxExecutors" : 10,
         "spark.dynamicAllocation.minExecutors": 1
      }
    }

To customize the Spark configuration in a Workspace that's connected to an Amazon EKS cluster, set the Spark configuration in the SparkContext object:

spark.conf.set("spark.executor.memory", "8g")

Confirm the Spark configuration

Take the following actions:

  • For Amazon EC2 and EMR Serverless, run the %%info command on the client-side Workspace:

    Example output:

    Current session configs: {'driverMemory': '4G', 'driverCores': 4, 'executorMemory': '8G', 'executorCores': 2, 'conf': {'spark.dynamicAllocation.maxExecutors': 10, 'spark.dynamicAllocation.minExecutors': 1}, 'proxyUser': 'assumed-role_Admin_biditdas-Isengard', 'kind': 'pyspark'}
  • For Amazon EC2, check the logs at /var/log/livy/livy-livy-server.out in the cluster's primary node. If a Spark session started, then you see a log entry similar to the following:

    20/06/24 10:11:22 INFO InteractiveSession$: Creating Interactive session 2: [owner: null, request: [kind: pyspark, proxyUser: None, executorMemory: 4G, conf: spark.dynamicAllocation.enabled -> false, heartbeatTimeoutInSecond: 0]]
  • For Amazon EKS, use the Spark UI. 

Related information

Enhanced kernels with magic commands

REST API on the Apache Livy website

Features available with the new kernels on the GitHub website

AWS OFFICIALUpdated 2 years ago