I want to configure or modify Apache Hadoop YARN node labeling in Amazon EMR.
Short description
The default YARN node label settings depend on your version of Amazon EMR.
Amazon EMR versions 5.19.x and later in the Amazon EMR-5.x release versions
The YARN node labels feature is turned on by default. When you turn on this feature, the CORE node label is created for core nodes with the following properties:
yarn.node-labels.enabled: true
yarn.node-labels.am.default-node-label-expression: 'CORE'
YARN ApplicationMaster containers are allocated only on core nodes. For all other containers, there isn't a partition restriction. You can allocate the containers on either core nodes or task nodes.
Amazon EMR version 6.x and later
The YARN node labels feature is turned off by default. The application's primary processes can run on both core and task nodes.
Resolution
Note: Before you configure YARN node labels in your production environment, it's a best practice to configure them in a test environment. When you turn off the YARN node label feature, the Application-Master container launches in any node type. There's no restriction for task nodes. For task nodes that are configured with Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances, running jobs might fail if the task node goes down because of Spot capacity constraint.
Turn off YARN labels in Amazon EMR version 5.19.x and later in the Amazon EMR-5.x release versions
Turn off the YARN label feature when you create an EMR cluster
Complete the following steps:
-
In the Edit software settings section, under Enter configuration, add the following properties:
[
{
"Classification": "yarn-site",
"Properties": {
"yarn.node-labels.enabled": "false",
"yarn.node-labels.am.default-node-label-expression": ""
}
}
]
-
Create the following script with a .sh extension, and then upload the script to an Amazon Simple Storage Service (Amazon S3) bucket:
#!/bin/bash
sudo sed -i 's/yarn rmadmin.*-addToClusterNodeLabels "CORE(exclusive=false)"/echo "NO LABELS"/g' /var/aws/emr/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp
-
In the Bootstrap Actions section, add the script as a custom action, and then create the cluster.
-
To confirm that the change is applied, run the following command in the primary node:
yarn cluster --list-node-labels
The output shows an empty value for the node labels:
<<<<< Node Labels: >>>>>>
Turn off the YARN label feature in an existing Amazon EMR cluster
Complete the following steps:
-
Use SSH to connect to the Amazon EMR primary node.
-
Create a backup of your yarn-site.xml file. The path is :/etc/hadoop/conf/yarn-site.xml.
-
Run the following command to open the yarn-site.xml in file editor mode:
sudo su vi yarn-site.xml
-
Change the yarn.node-labels.enabled property value to false:
<property>
<name>yarn.node-labels.enabled</name>
<value>false</value>
</property>
-
Remove the CORE value in the yarn.node-labels.am.default-node-label-expression property:
<property>
<name>yarn.node-labels.am.default-node-label-expression</name>
<value></value>
</property>
-
If your cluster version is later than 5.29.0, then run the following commands to restart ResourceManager:
sudo systemctl restart hadoop-yarn-resourcemanager.service
sudo systemctl status hadoop-yarn-resourcemanager.service
-or-
If your cluster version if 5.29.0 or earlier, then run the following commands to restart ResourceManager:
sudo stop hadoop-yarn-resourcemanager
sudo start hadoop-yarn-resourcemanager
-
To confirm that the change is applied, run the following command:
yarn cluster --list-node-labels
The output shows an empty value for the node labels:
<<<<< Node Labels: >>>>>>
Turn on the YARN labels feature in Amazon EMR version 6.x and later
Turn on the YARN label feature when you create an EMR cluster
-
In the Edit software settings section, under Enter configuration, add the following properties:
[
{
"Classification": "yarn-site",
"Properties": {
"yarn.node-labels.enabled": "true",
"yarn.node-labels.am.default-node-label-expression": "CORE"
}
}
]
-
Create the cluster.
-
To confirm that the change is applied, run the following command in the primary node:
yarn cluster --list-node-labels
The output shows an empty value for the node labels:
<<<<< Node Labels: <CORE:exclusivity=false> >>>>>
Turn on the YARN label feature in an existing EMR cluster
Complete the following steps:
-
Open the Amazon EMR console.
-
In the navigation pane, choose Clusters, and then select the cluster that you want to edit.
-
Choose the Configurations tab.
-
Under Instance group configurations, choose an instance group.
-
Choose the Reconfigure icon, select Edit in JSON, and then add following properties:
[
{
"Classification": "yarn-site",
"Properties": {
"yarn.node-labels.enabled": "true",
"yarn.node-labels.am.default-node-label-expression": "CORE"
}
}
]
-
Choose Apply this configuration to all active instance groups, and then save the changes.
-
Run the following command as the hadoop user in the primary node:
yarn rmadmin -addToClusterNodeLabels "CORE(exclusive=false)"
-
To confirm that the change is applied, run the following command in the primary node:
yarn cluster --list-node-labels
The output shows an empty value for the node labels:
<<<<< Node Labels: <CORE:exclusivity=false> >>>>>>