Spark application takes longer than expected in emr 7

0

I have spark application running in emr 7 that took 15+ hours which was taken 9 hours in emr 6.14. There is no code change and data volume changes. One observation is the application attempted thrice in emr 7 that would answered overall delay caused in emr 7.

Question is why it was reattempted only in emr 7 but not in emr 6.14. Really appreciate if someone help me fix this issue. Is there anything I need to handle specifically in emr 7?

Vaas
asked 13 days ago159 views
3 Answers
4
Accepted Answer

To insist AM to launch only on core node(on-demand instance), you can enable the yarn node label with below params,

[
  {
    "Classification": "yarn-site",
    "Properties": {
      "yarn.node-labels.enabled": "true",
      "yarn.node-labels.am.default-node-label-expression": "CORE"
    }
  }
]

You can add the above properties when provisioning the cluster or reconfigure the existing cluster. Once they are set, please verify them by connecting primary instance and run the below command

yarn cluster --list-node-labels

This will provide the output of the updated label which will make sure to apply the AM only on core instance. More details are specified in this document for your reference.

AWS
SUPPORT ENGINEER
answered 13 days ago
profile picture
EXPERT
reviewed 13 days ago
2

Hello,

Possibly the application master could have been launched in any core/task node would became unhealthy or decommissioned during the execution or there could be any resource bottlenecks would have occurred on the instance where the AM launched that might reattempted three times in EMR 7.0 version.

Please check where the application master container launched by checking the driver log. If you submitted the application via EMR step, check the stderr log and get the "Application master host name". Check the resource utilization of the node and investigate if they were terminated due to any reasons.

It also be the case if you would have chosen the task node as Spot instance and the AM would be launched on the spot instance which might be terminated due to spot capacity interruption. Because EMR 6.x and 7.x, AM can be launched either on core or task nodes. If it was hosted on task/core node with spot type, then it would ended up in spot termination.

AWS
SUPPORT ENGINEER
answered 13 days ago
profile picture
EXPERT
reviewed 10 days ago
profile picture
EXPERT
reviewed 13 days ago
1

Thanks Yokesh. I see the application master hosted on one of the task nodes which is a Spot instance that led to termination twice. In Emr 6.14, the AM was launched on core node but not in Emr 7.

How I can make sure to run the application master on on demand node. In my cluster, I have core node as on demand and task nodes as spot.

Vaas
answered 13 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions

Relevant content