Spark application takes longer than expected in emr 7

0

I have spark application running in emr 7 that took 15+ hours which was taken 9 hours in emr 6.14. There is no code change and data volume changes. One observation is the application attempted thrice in emr 7 that would answered overall delay caused in emr 7.

Question is why it was reattempted only in emr 7 but not in emr 6.14. Really appreciate if someone help me fix this issue. Is there anything I need to handle specifically in emr 7?

Vaas
posta un mese fa262 visualizzazioni
3 Risposte
4
Risposta accettata

To insist AM to launch only on core node(on-demand instance), you can enable the yarn node label with below params,

[
  {
    "Classification": "yarn-site",
    "Properties": {
      "yarn.node-labels.enabled": "true",
      "yarn.node-labels.am.default-node-label-expression": "CORE"
    }
  }
]

You can add the above properties when provisioning the cluster or reconfigure the existing cluster. Once they are set, please verify them by connecting primary instance and run the below command

yarn cluster --list-node-labels

This will provide the output of the updated label which will make sure to apply the AM only on core instance. More details are specified in this document for your reference.

AWS
TECNICO DI SUPPORTO
con risposta un mese fa
profile picture
ESPERTO
verificato un mese fa
2

Hello,

Possibly the application master could have been launched in any core/task node would became unhealthy or decommissioned during the execution or there could be any resource bottlenecks would have occurred on the instance where the AM launched that might reattempted three times in EMR 7.0 version.

Please check where the application master container launched by checking the driver log. If you submitted the application via EMR step, check the stderr log and get the "Application master host name". Check the resource utilization of the node and investigate if they were terminated due to any reasons.

It also be the case if you would have chosen the task node as Spot instance and the AM would be launched on the spot instance which might be terminated due to spot capacity interruption. Because EMR 6.x and 7.x, AM can be launched either on core or task nodes. If it was hosted on task/core node with spot type, then it would ended up in spot termination.

AWS
TECNICO DI SUPPORTO
con risposta un mese fa
profile picture
ESPERTO
verificato un mese fa
profile picture
ESPERTO
verificato un mese fa
1

Thanks Yokesh. I see the application master hosted on one of the task nodes which is a Spot instance that led to termination twice. In Emr 6.14, the AM was launched on core node but not in Emr 7.

How I can make sure to run the application master on on demand node. In my cluster, I have core node as on demand and task nodes as spot.

Vaas
con risposta un mese fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande