How do I configure or modify YARN node labeling in Amazon EMR?

5 minute read
0

I want to configure or modify Apache Hadoop YARN node labeling in Amazon EMR.

Short description

The default YARN node label settings depend on your version of Amazon EMR.

Amazon EMR versions 5.19.x and later in the Amazon EMR-5.x release versions

The YARN node labels feature is turned on by default. When you turn on this feature, the CORE node label is created for core nodes with the following properties:

yarn.node-labels.enabled: true  
yarn.node-labels.am.default-node-label-expression: 'CORE'

YARN ApplicationMaster containers are allocated only on core nodes. For all other containers, there isn't a partition restriction. You can allocate the containers on either core nodes or task nodes.

Amazon EMR version 6.x and later

The YARN node labels feature is turned off by default. The application's primary processes can run on both core and task nodes.

Resolution

Note: Before you configure YARN node labels in your production environment, it's a best practice to configure them in a test environment. When you turn off the YARN node label feature, the Application-Master container launches in any node type. There's no restriction for task nodes. For task nodes that are configured with Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances, running jobs might fail if the task node goes down because of Spot capacity constraint.

Turn off YARN labels in Amazon EMR version 5.19.x and later in the Amazon EMR-5.x release versions

Turn off the YARN label feature when you create an EMR cluster

Complete the following steps:

  1. In the Edit software settings section, under Enter configuration, add the following properties:

    [  
      {  
        "Classification": "yarn-site",  
        "Properties": {  
          "yarn.node-labels.enabled": "false",  
          "yarn.node-labels.am.default-node-label-expression": ""  
        }  
      }  
    ]
  2. Create the following script with a .sh extension, and then upload the script to an Amazon Simple Storage Service (Amazon S3) bucket:

    #!/bin/bash  
    sudo sed -i 's/yarn rmadmin.*-addToClusterNodeLabels "CORE(exclusive=false)"/echo "NO LABELS"/g' /var/aws/emr/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp
  3. In the Bootstrap Actions section, add the script as a custom action, and then create the cluster.

  4. To confirm that the change is applied, run the following command in the primary node:

    yarn cluster --list-node-labels

    The output shows an empty value for the node labels:

    <<<<< Node Labels: >>>>>>

Turn off the YARN label feature in an existing Amazon EMR cluster

Complete the following steps:

  1. Use SSH to connect to the Amazon EMR primary node.

  2. Create a backup of your yarn-site.xml file. The path is :/etc/hadoop/conf/yarn-site.xml.

  3. Run the following command to open the yarn-site.xml in file editor mode:

    sudo su vi yarn-site.xml
  4. Change the yarn.node-labels.enabled property value to false:

    <property>  
      <name>yarn.node-labels.enabled</name>  
      <value>false</value>  
    </property>
  5. Remove the CORE value in the yarn.node-labels.am.default-node-label-expression property:

    <property>  
      <name>yarn.node-labels.am.default-node-label-expression</name>  
      <value></value>  
    </property>
  6. If your cluster version is later than 5.29.0, then run the following commands to restart ResourceManager:

    sudo systemctl restart hadoop-yarn-resourcemanager.service
    sudo systemctl status hadoop-yarn-resourcemanager.service

    -or-
    If your cluster version if 5.29.0 or earlier, then run the following commands to restart ResourceManager:

    sudo stop hadoop-yarn-resourcemanager
    sudo start hadoop-yarn-resourcemanager
  7. To confirm that the change is applied, run the following command:

    yarn cluster --list-node-labels

    The output shows an empty value for the node labels:

    <<<<< Node Labels: >>>>>>

Turn on the YARN labels feature in Amazon EMR version 6.x and later

Turn on the YARN label feature when you create an EMR cluster

  1. In the Edit software settings section, under Enter configuration, add the following properties:

    [  
      {  
        "Classification": "yarn-site",  
        "Properties": {  
          "yarn.node-labels.enabled": "true",  
          "yarn.node-labels.am.default-node-label-expression": "CORE"  
        }  
      }  
    ]
  2. Create the cluster.

  3. To confirm that the change is applied, run the following command in the primary node:

    yarn cluster --list-node-labels

    The output shows an empty value for the node labels:

    <<<<< Node Labels: <CORE:exclusivity=false>  >>>>>

Turn on the YARN label feature in an existing EMR cluster

Complete the following steps:

  1. Open the Amazon EMR console.

  2. In the navigation pane, choose Clusters, and then select the cluster that you want to edit.

  3. Choose the Configurations tab.

  4. Under Instance group configurations, choose an instance group.

  5. Choose the Reconfigure icon, select Edit in JSON, and then add following properties:

    [  
      {  
        "Classification": "yarn-site",  
        "Properties": {  
          "yarn.node-labels.enabled": "true",  
          "yarn.node-labels.am.default-node-label-expression": "CORE"  
        }  
      }  
    ]
  6. Choose Apply this configuration to all active instance groups, and then save the changes.

  7. Run the following command as the hadoop user in the primary node:

    yarn rmadmin -addToClusterNodeLabels "CORE(exclusive=false)"
  8. To confirm that the change is applied, run the following command in the primary node:

    yarn cluster --list-node-labels

    The output shows an empty value for the node labels:

    <<<<< Node Labels: <CORE:exclusivity=false>  >>>>>>
AWS OFFICIAL
AWS OFFICIALUpdated 5 days ago
2 Comments

Update:

Seems this page needs to be modified in:

Section: Turn off the YARN labels feature in Amazon EMR version 5.19.x and above in Amazon EMR-5.x.x series

Sub: Turn off the default YARN label feature when creating a new EMR cluster:

Step 3.

EMR 5.19.x - 5.32.x versions BA script should be:

sudo sed -i 's/yarn rmadmin -addToClusterNodeLabels "CORE(exclusive=false)"/echo "NO LABELS"/g' /var/aws/emr/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp

EMR 5.33.x and above versions BA script should be:

sudo sed -i 's/yarn rmadmin -Dyarn.resourcemanager.connect.retry-interval.ms=2000 -addToClusterNodeLabels "CORE(exclusive=false)"/echo "NO LABELS"/g' /var/aws/emr/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp
replied 10 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 10 months ago