Spark application takes longer than expected in emr 7

0

I have spark application running in emr 7 that took 15+ hours which was taken 9 hours in emr 6.14. There is no code change and data volume changes. One observation is the application attempted thrice in emr 7 that would answered overall delay caused in emr 7.

Question is why it was reattempted only in emr 7 but not in emr 6.14. Really appreciate if someone help me fix this issue. Is there anything I need to handle specifically in emr 7?

Vaas
질문됨 2달 전282회 조회
3개 답변
4
수락된 답변

To insist AM to launch only on core node(on-demand instance), you can enable the yarn node label with below params,

[
  {
    "Classification": "yarn-site",
    "Properties": {
      "yarn.node-labels.enabled": "true",
      "yarn.node-labels.am.default-node-label-expression": "CORE"
    }
  }
]

You can add the above properties when provisioning the cluster or reconfigure the existing cluster. Once they are set, please verify them by connecting primary instance and run the below command

yarn cluster --list-node-labels

This will provide the output of the updated label which will make sure to apply the AM only on core instance. More details are specified in this document for your reference.

AWS
지원 엔지니어
답변함 2달 전
profile picture
전문가
검토됨 2달 전
2

Hello,

Possibly the application master could have been launched in any core/task node would became unhealthy or decommissioned during the execution or there could be any resource bottlenecks would have occurred on the instance where the AM launched that might reattempted three times in EMR 7.0 version.

Please check where the application master container launched by checking the driver log. If you submitted the application via EMR step, check the stderr log and get the "Application master host name". Check the resource utilization of the node and investigate if they were terminated due to any reasons.

It also be the case if you would have chosen the task node as Spot instance and the AM would be launched on the spot instance which might be terminated due to spot capacity interruption. Because EMR 6.x and 7.x, AM can be launched either on core or task nodes. If it was hosted on task/core node with spot type, then it would ended up in spot termination.

AWS
지원 엔지니어
답변함 2달 전
profile picture
전문가
검토됨 한 달 전
profile picture
전문가
A_J
검토됨 2달 전
1

Thanks Yokesh. I see the application master hosted on one of the task nodes which is a Spot instance that led to termination twice. In Emr 6.14, the AM was launched on core node but not in Emr 7.

How I can make sure to run the application master on on demand node. In my cluster, I have core node as on demand and task nodes as spot.

Vaas
답변함 2달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인