Error when save using file system s3a in EMR 6.6.0 with spark 3.2

4

We are trying to save data using s3a and we are getting this error, but when we save the data as s3 it works fine. Below you can find the stacktrace.

Closing SparkContext Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:813) at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1235) at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.create(FileContext.java:698) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:140) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:143) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333) at org.apache.spark.sql.execution.streaming.StreamMetadata$.write(StreamMetadata.scala:79) at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$streamMetadata$1(StreamExecution.scala:141) at scala.Option.getOrElse(Option.scala:189)

질문됨 2년 전1026회 조회
1개 답변
0

This is a known issue and was Fixed in EMR 6.7.0 and above.

Root cause :

In EMR-6.6.0, Spark was upgraded to 3.2.0 which includes SPARK-33212, a change that makes Spark depend on shaded Hadoop client jars (hadoop-client-api and hadoop-client-runtime) instead of the unshaded ones. The difference between the jars is that the third-party classes in the shaded jars were relocated to different packages. However, S3A classes were built assuming that the unshaded client jars are used (i.e. third-party classes aren’t relocated).

When Spark creates an S3AFileSystem, the S3AFileSystem.create() method attempts to invoke the constructor of a Hadoop utility class (SemaphoredDelegatingExecutor) while incorrectly assuming it accepts the unshaded version of a Guava class [1]. The actual parameter type of this constructor in Spark turns out to be the shaded version of this class [2]. This causes the JVM to throw the NoSuchMethodError, causing customer’s application to fail.

This issue was fixed in Hadoop 3.2.2 by HADOOP-16080, but Spark in EMR-6.6.0 uses Hadoop 3.2.1 without this fix backported.

[1] com.google.common.util.concurrent.ListeningExecutorService

[2] org.apache.hadoop.shaded.com.google.common.util.concurrent.ListeningExecutorService

If upgrading to EMR 6.7 is not an option, please reach out to AWS Support.

profile pictureAWS
답변함 일 년 전
profile picture
전문가
검토됨 2달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠