Resolving classpath issues on EMR

0

What are the general guidelines to in Resolving classpath issues on EMR? One of the issues when running pipelines on EMR is related to classpath issues related to custom jars:

Data Processing pipelines frequently fail on EMR due to not being able to refer to the specific versions of dependent jars even though customer uploaded the required jars to S3 and then pushed to the EMR master node at the time of cluster creation. tried set the below parameters as part of the Pipeline command:

-D mapreduce.task.classpath.user.precedence -D mapreduce.job.user.classpath.first

AWS
질문됨 4년 전897회 조회
1개 답변
0
수락된 답변

This is wide topic, and usually depends on the framework that you're using. Generally speaking for application where you should submit a JAR, like Spark or MR, the recommended approach is to generate a fat JAR with all the dependencies inside. This guarantees that the JVM will always pick the correct libraries from the JAR instead of looking them on the cluster, where it might be not able to find them or pick a wrong version.

If you're interested, in this third party article [ http://tutorials.jenkov.com/maven/maven-build-fat-jar.html ] you can find more details about fat JARs and how to create them.

AWS
답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인