Resolving classpath issues on EMR

0

What are the general guidelines to in Resolving classpath issues on EMR? One of the issues when running pipelines on EMR is related to classpath issues related to custom jars:

Data Processing pipelines frequently fail on EMR due to not being able to refer to the specific versions of dependent jars even though customer uploaded the required jars to S3 and then pushed to the EMR master node at the time of cluster creation. tried set the below parameters as part of the Pipeline command:

-D mapreduce.task.classpath.user.precedence -D mapreduce.job.user.classpath.first

AWS
已提问 4 年前896 查看次数
1 回答
0
已接受的回答

This is wide topic, and usually depends on the framework that you're using. Generally speaking for application where you should submit a JAR, like Spark or MR, the recommended approach is to generate a fat JAR with all the dependencies inside. This guarantees that the JVM will always pick the correct libraries from the JAR instead of looking them on the cluster, where it might be not able to find them or pick a wrong version.

If you're interested, in this third party article [ http://tutorials.jenkov.com/maven/maven-build-fat-jar.html ] you can find more details about fat JARs and how to create them.

AWS
已回答 4 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则