Error when save using file system s3a in EMR 6.6.0 with spark 3.2

4

We are trying to save data using s3a and we are getting this error, but when we save the data as s3 it works fine. Below you can find the stacktrace.

Closing SparkContext Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:813) at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1235) at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.create(FileContext.java:698) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:140) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:143) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333) at org.apache.spark.sql.execution.streaming.StreamMetadata$.write(StreamMetadata.scala:79) at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$streamMetadata$1(StreamExecution.scala:141) at scala.Option.getOrElse(Option.scala:189)

posta 2 anni fa1048 visualizzazioni
1 Risposta
0

This is a known issue and was Fixed in EMR 6.7.0 and above.

Root cause :

In EMR-6.6.0, Spark was upgraded to 3.2.0 which includes SPARK-33212, a change that makes Spark depend on shaded Hadoop client jars (hadoop-client-api and hadoop-client-runtime) instead of the unshaded ones. The difference between the jars is that the third-party classes in the shaded jars were relocated to different packages. However, S3A classes were built assuming that the unshaded client jars are used (i.e. third-party classes aren’t relocated).

When Spark creates an S3AFileSystem, the S3AFileSystem.create() method attempts to invoke the constructor of a Hadoop utility class (SemaphoredDelegatingExecutor) while incorrectly assuming it accepts the unshaded version of a Guava class [1]. The actual parameter type of this constructor in Spark turns out to be the shaded version of this class [2]. This causes the JVM to throw the NoSuchMethodError, causing customer’s application to fail.

This issue was fixed in Hadoop 3.2.2 by HADOOP-16080, but Spark in EMR-6.6.0 uses Hadoop 3.2.1 without this fix backported.

[1] com.google.common.util.concurrent.ListeningExecutorService

[2] org.apache.hadoop.shaded.com.google.common.util.concurrent.ListeningExecutorService

If upgrading to EMR 6.7 is not an option, please reach out to AWS Support.

profile pictureAWS
con risposta un anno fa
profile picture
ESPERTO
verificato 2 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande