Hi,
I have just been fiddling in EMR Serverless recently from this week after GA Release. I found that I am not able to download my AWS S3 jar files if I try to run EMR Serverless job from private VPC subnet. I have already tested the connectivity from EC2 using same subnet and same security group but the problem exist only in EMR Serverless Job. From the error logs I can see it is trying to connect to Spark IPV6 address which I am not sure why it is not connecting.
Region: ap-northeast-1
Subnet: ap-northeast-1a (I have tried other subnets too)
Here are my tail logs:
22/06/03 19:08:42 INFO SparkContext: Added JAR file:/tmp/spark-cdc6b2aa-657b-4464-b7d8-3cbe2fea3872/xbean-asm9-shaded-4.20.jar at spark://[2406:da14:5a:5a01:41a7:4134:a18b:f5f8]:42539/jars/xbean-asm9-shaded-4.20.jar with timestamp 1654283322058
22/06/03 19:08:42 INFO SparkContext: Added JAR file:/tmp/spark-cdc6b2aa-657b-4464-b7d8-3cbe2fea3872/xz-1.8.jar at spark://[2406:da14:5a:5a01:41a7:4134:a18b:f5f8]:42539/jars/xz-1.8.jar with timestamp 1654283322058
22/06/03 19:08:42 INFO SparkContext: Added JAR file:/tmp/spark-cdc6b2aa-657b-4464-b7d8-3cbe2fea3872/zookeeper-3.6.2.jar at spark://[2406:da14:5a:5a01:41a7:4134:a18b:f5f8]:42539/jars/zookeeper-3.6.2.jar with timestamp 1654283322058
22/06/03 19:08:42 INFO SparkContext: Added JAR file:/tmp/spark-cdc6b2aa-657b-4464-b7d8-3cbe2fea3872/zookeeper-jute-3.6.2.jar at spark://[2406:da14:5a:5a01:41a7:4134:a18b:f5f8]:42539/jars/zookeeper-jute-3.6.2.jar with timestamp 1654283322058
22/06/03 19:08:42 INFO SparkContext: Added JAR file:/tmp/spark-cdc6b2aa-657b-4464-b7d8-3cbe2fea3872/zstd-jni-1.5.0-4.jar at spark://[2406:da14:5a:5a01:41a7:4134:a18b:f5f8]:42539/jars/zstd-jni-1.5.0-4.jar with timestamp 1654283322058
22/06/03 19:08:42 INFO SparkContext: Added JAR s3://datalake-cbts-test/spark-jobs/app.jar at s3://datalake-cbts-test/spark-jobs/app.jar with timestamp 1654283322058
22/06/03 19:08:42 INFO Executor: Starting executor ID driver on host ip-10-0-148-61.ap-northeast-1.compute.internal
22/06/03 19:08:42 INFO Executor: Fetching spark://[2406:da14:5a:5a01:41a7:4134:a18b:f5f8]:42539/jars/protobuf-java-2.5.0.jar with timestamp 1654283322058
22/06/03 19:08:43 ERROR Utils: Aborting task
java.io.IOException: Failed to connect to /2406:da14:5a:5a01:41a7:4134:a18b:f5f8:42539
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399)
at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1508)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:763)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:550)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:962)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:954)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:954)
at org.apache.spark.executor.Executor.<init>(Executor.scala:247)
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:582)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2694)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
at np.com.ngopal.spark.SparkJob.getSession(SparkJob.java:74)
at np.com.ngopal.spark.SparkJob.main(SparkJob.java:111)
Thanks
I am able to connect S3 using NAT Gateway and I even did ping with ipv6 which works good . Right now EMR Serverless uses IPV6 natively for linking associated library jar files that I provide during runtime so that is why I had to use ipv6 compatible VPC. I do not think there is any way to assign ENI for EMR Serverless because it is all managed internally.
Thanks