How do I resolve the "Caused by: java.lang.ClassNotFoundException: com.google.protobuf.GeneratedMessageV3" in spark on Amazon EMR.

0

I am using EMR 711395599931.dkr.ecr.us-east-2.amazonaws.com/spark/emr-6.14.0:latest from SparkSubmitOperator and passing this to jar where I am executing a User define function (UDF) in spark. I am getting below error while trying to execute this executing the UDF. Caused by: java.lang.ClassNotFoundException: com.google.protobuf.GeneratedMessageV3 at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

Earlier I was using spark version 2.4.5 and scala version 2.11.12 where it was executing without any error Now I have upgraded it to spark version 3.4.1 and scala version 2.12.15 when the error is coming.

Is this a known issue and possible resolution is available from aws ?

asked 2 months ago297 views
2 Answers
1

The error you're encountering, "ClassNotFoundException: com.google.protobuf.GeneratedMessageV3", is likely due to a missing or incompatible protobuf dependency after upgrading your Spark and Scala versions.

To resolve this issue, you need to ensure that the correct version of the protobuf library is available in your Spark application's classpath. Here are some steps you can take:

  1. Check the protobuf version required by Spark 3.4.1. You may need to add the appropriate protobuf dependency to your project.

  2. If you're using a build tool like Maven or SBT, add the following dependency (adjust the version if needed):

com.google.protobuf:protobuf-java:3.x.x
  1. If you're submitting the job using spark-submit, you can include the protobuf jar in the --jars option:
--jars /path/to/protobuf-java-3.x.x.jar
  1. Ensure that your UDF and any dependencies it relies on are compatible with Spark 3.4.1 and Scala 2.12.15.

  2. If you're using EMR 6.14.0, check if the protobuf library is already included in the EMR classpath. If it is, make sure your application is not introducing a conflicting version.

  3. You may need to update your SparkSubmitOperator configuration to include the necessary jars or adjust the classpath.

If the issue persists, you might want to review your entire dependency tree to ensure all libraries are compatible with your new Spark and Scala versions. Sometimes, upgrading major versions can lead to compatibility issues with other dependencies.

Remember to test your changes in a development environment before deploying to production. If you continue to face issues, you may need to consult the specific EMR and Spark documentation for version 6.14.0 and 3.4.1 respectively, as there might be EMR-specific configurations or known issues related to this upgrade.
Sources
Amazon EMR release 6.14.0 - Amazon EMR
Troubleshooting Amazon EMR on EKS Spark operator - Amazon EMR

profile picture
answered 2 months ago
  • regarding protobuf version I am using shading in my build.sbt. So it is picking version according to requirement. I think the issue is coming due to including hadoop library in emr_on_eks config which has its own protobuf version and is overwriting the protobuf version in my jar. Is there any way to exclude protobuf library from SparkSubmitOperator?

0

From the shared error message; ... Caused by: java.lang.ClassNotFoundException: com.google.protobuf.GeneratedMessageV3 ...

Understanding the Issue

Based on the error message and the Dinky official documentation [1], the root cause seems to be a missing or incompatible protobuf dependency.

Dinky requires the com.google.protobuf.GeneratedMessageV3 class, which is part of the Protobuf library. However, your EMR cluster is using Protobuf version 2.5.0, which does not include this class. Proposed Solutions

Upgrade EMR Cluster: If Dinky indeed requires Protobuf 3.x, you can upgrade your EMR cluster to version 6.9.0 or later. These versions use Hadoop 3.3.3, which includes Protobuf 3.7.1 [2].
Replace JAR: Manually replace the protobuf-java-2.5.0.jar with a newer version that includes GeneratedMessageV3. You'll need to do this on all nodes in your cluster. Note: This is not an officially supported solution and may require testing.
Example: Replace /usr/lib/hadoop/lib/protobuf-java-2.5.0.jar with a newer version like protobuf-java-3.24.4.jar.

Important Considerations

Dinky is a Third-Party Library: AWS Support cannot provide direct support for third-party software like Dinky. We can only offer best effort assistance.
Testing: Always test any changes in a non-production environment before deploying them to production.
Dinky Community: If you continue to face issues, consider reaching out to the Dinky development team for more specific guidance.

Additional Notes

Bootstrap Actions: You can use bootstrap actions to automate the process of replacing JAR files on all nodes in your cluster.
Compatibility: Ensure that the new Protobuf version is compatible with other components in your environment.

I hope this information helps! If you have any further questions, feel free to contact AWS Support.

[1] Dinky official documentation - https://github.com/DataLinkDC/dinky/blob/dev/pom.xml [2] Hadoop 3.3.3 - https://hadoop.apache.org/docs/r3.3.3/ [3] Bootstrap action basics - https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html#bootstrapUses [4] What types of questions are supported? - https://aws.amazon.com/premiumsupport/faqs/?nc1=h_ls

AWS
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions