Error when save using file system s3a in EMR 6.6.0 with spark 3.2

4

We are trying to save data using s3a and we are getting this error, but when we save the data as s3 it works fine. Below you can find the stacktrace.

Closing SparkContext Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:813) at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1235) at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.create(FileContext.java:698) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:140) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:143) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333) at org.apache.spark.sql.execution.streaming.StreamMetadata$.write(StreamMetadata.scala:79) at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$streamMetadata$1(StreamExecution.scala:141) at scala.Option.getOrElse(Option.scala:189)

已提問 2 年前檢視次數 1026 次
1 個回答
0

This is a known issue and was Fixed in EMR 6.7.0 and above.

Root cause :

In EMR-6.6.0, Spark was upgraded to 3.2.0 which includes SPARK-33212, a change that makes Spark depend on shaded Hadoop client jars (hadoop-client-api and hadoop-client-runtime) instead of the unshaded ones. The difference between the jars is that the third-party classes in the shaded jars were relocated to different packages. However, S3A classes were built assuming that the unshaded client jars are used (i.e. third-party classes aren’t relocated).

When Spark creates an S3AFileSystem, the S3AFileSystem.create() method attempts to invoke the constructor of a Hadoop utility class (SemaphoredDelegatingExecutor) while incorrectly assuming it accepts the unshaded version of a Guava class [1]. The actual parameter type of this constructor in Spark turns out to be the shaded version of this class [2]. This causes the JVM to throw the NoSuchMethodError, causing customer’s application to fail.

This issue was fixed in Hadoop 3.2.2 by HADOOP-16080, but Spark in EMR-6.6.0 uses Hadoop 3.2.1 without this fix backported.

[1] com.google.common.util.concurrent.ListeningExecutorService

[2] org.apache.hadoop.shaded.com.google.common.util.concurrent.ListeningExecutorService

If upgrading to EMR 6.7 is not an option, please reach out to AWS Support.

profile pictureAWS
已回答 1 年前
profile picture
專家
已審閱 2 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南