Spark application takes longer than expected in emr 7

0

I have spark application running in emr 7 that took 15+ hours which was taken 9 hours in emr 6.14. There is no code change and data volume changes. One observation is the application attempted thrice in emr 7 that would answered overall delay caused in emr 7.

Question is why it was reattempted only in emr 7 but not in emr 6.14. Really appreciate if someone help me fix this issue. Is there anything I need to handle specifically in emr 7?

Vaas
已提问 1 个月前267 查看次数
3 回答
4
已接受的回答

To insist AM to launch only on core node(on-demand instance), you can enable the yarn node label with below params,

[
  {
    "Classification": "yarn-site",
    "Properties": {
      "yarn.node-labels.enabled": "true",
      "yarn.node-labels.am.default-node-label-expression": "CORE"
    }
  }
]

You can add the above properties when provisioning the cluster or reconfigure the existing cluster. Once they are set, please verify them by connecting primary instance and run the below command

yarn cluster --list-node-labels

This will provide the output of the updated label which will make sure to apply the AM only on core instance. More details are specified in this document for your reference.

AWS
支持工程师
已回答 1 个月前
profile picture
专家
已审核 1 个月前
2

Hello,

Possibly the application master could have been launched in any core/task node would became unhealthy or decommissioned during the execution or there could be any resource bottlenecks would have occurred on the instance where the AM launched that might reattempted three times in EMR 7.0 version.

Please check where the application master container launched by checking the driver log. If you submitted the application via EMR step, check the stderr log and get the "Application master host name". Check the resource utilization of the node and investigate if they were terminated due to any reasons.

It also be the case if you would have chosen the task node as Spot instance and the AM would be launched on the spot instance which might be terminated due to spot capacity interruption. Because EMR 6.x and 7.x, AM can be launched either on core or task nodes. If it was hosted on task/core node with spot type, then it would ended up in spot termination.

AWS
支持工程师
已回答 1 个月前
profile picture
专家
已审核 1 个月前
profile picture
专家
已审核 1 个月前
1

Thanks Yokesh. I see the application master hosted on one of the task nodes which is a Spot instance that led to termination twice. In Emr 6.14, the AM was launched on core node but not in Emr 7.

How I can make sure to run the application master on on demand node. In my cluster, I have core node as on demand and task nodes as spot.

Vaas
已回答 1 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则