Skip to content

How do I resolve node label and YARN ResourceManager failures in Amazon EMR?

2 minute read
0

When I turn on node labels for an Amazon EMR cluster, YARN ResourceManager fails.

Short description

In Amazon EMR versions 5.19.0 to 5.21.0, YARN ResourceManager might fail when you turn on node labels for an Amazon EMR cluster. In these versions, Amazon EMR stores node label files in the following Hadoop Distributed File System (HDFS):

  • DEFAULT_DIR_NAME = "node-labels"
  • MIRROR_FILENAME = "nodelabel.mirror"
  • EDITLOG_FILENAME = "nodelabel.editlog"

Amazon EMR stores the node label files at yarn.node-labels.fs-store.root-dir: '/apps/yarn/nodelabels' in yarn-site.xml on all nodes.

The ResourceManager fails when you lose all nodes that contain the file's blocks during a resize operation and the files become corrupted. ResourceManager restarts, then gets stuck in a restart loop, and then CommonNodeLabelsManager throws an exception.

To find the exception, search for org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager in /var/log/hadoop-yarn/yarn-yarn-resourcemanager-*.log.

To resolve this error, delete the node label files. Then, restart ResourceManager to recreate the files.

Resolution

Complete the following steps:

  1. Check file system health and locate the blocks:

    hdfs fsck /apps/yarn/nodelabels/ -locations -blocks -files
  2. Remove the node label files:

    hdfs dfs -rm -skipTrash /apps/yarn/nodelabels/*
  3. Restart ResourceManager:

    sudo stop hadoop-yarn-resourcemanager; sudo start hadoop-yarn-resourcemanager

    When ResourceManager restarts, it recreates the required node label files. This resolves the restart loop.
    Expected output:

    hadoop fs -ls /apps/yarn/nodelabels/
    Found 2 items
    -rw-r--r--   1 yarn hadoop          0 2025-10-08 04:25 /apps/yarn/nodelabels/nodelabel.editlog
    -rw-r--r--   1 yarn hadoop          2 2025-10-08 04:25 /apps/yarn/nodelabels/nodelabel.mirror

After ResourceManager restarts, you can't submit YARN applications. If you try to submit YARN applications, then you might receive the following error:

"25/10/08 04:30:39 INFO Client: Deleted staging directory hdfs://ip-172-31-249-244.ec2.internal:8020/user/hadoop/.sparkStaging/application_1759897521436_0001
Exception in thread "main" org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid label resource request, cluster do not contain , label= CORE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkQueueLabelInLabelManager(SchedulerUtils.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:336) .... at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException): Invalid label resource request, cluster do not contain ...at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)"

Before you can submit YARN applications, you must manually add node label entries. To add the node label entries, run the following command:

yarn rmadmin -addToClusterNodeLabels "CORE(exclusive=false)"

To list all the labels and confirm that ResourceManager recreated the labels, run the following command:

yarn cluster --list-node-labels

Example output:

25/10/08 04:33:51 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-249-244.ec2.internal/172.31.249.244:8032
Node Labels: <CORE:exclusivity=false>

Related information

Understand node types in Amazon EMR: primary, core, and task nodes

AWS OFFICIALUpdated a month ago