AWS Glue Job Error : An error occurred while calling o115.getDynamicFrame.: java.lang.UnsupportedOperationException: empty.reduceLeft

0

Hello Friends, I am working on a project that reads data from Dremio data lakehouse solution, I am trying to read the data from one of its schema. Glue does not natively come with the connector, so I had to build a custom jdbc.

See my code base

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkConf, SparkContext
from pyspark.sql import SparkSession
from awsglue.context import GlueContext
from awsglue.job import Job
conf = SparkConf()

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")\
    .set("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")\
    .set("spark.sql.catalog.glue_catalog.warehouse", "s3://dev-smt-data-cache/stageZone_iceberg/iceber_repo/smt-data/")\
    .set("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")\
    .set("spark.sql.catalog.glue_catalog.io-impl","org.apache.iceberg.aws.s3.S3FileIO")\
    .set("--datalake-formats","iceberg")

sc = SparkContext(conf=conf)
glueContext = GlueContext(sc)

# below spark session will have the above configuration
spark = glueContext.spark_session   
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

DynamicFrame = glueContext.create_dynamic_frame.from_options(
        connection_type = "jdbc", 
        connection_options = {
        "query":""" 'SELECT FileID FROM "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" LIMIT 10' """,
        "inferSchema":True,
        # "dbtable": """ "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" """,
        "connectionName":"Dremio-Stage"}
         #"transformation_ctx" = "DynamicFrame"
         )

applyformat = ApplyMapping.apply(
    frame =DynamicFrame, 
    mappings =
        [("field1","string","field1","string")
        #("field2","string","field2","string") 
        ], 
    transformation_ctx = "applyformat"
    )      
dynamicFrame = DynamicFrame.toDF().createOrReplaceTempView("temp_table")
print(dynamicFrame.head(5))

I keep getting this error before , I have different approaches none working out

   raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o115.getDynamicFrame.
: java.lang.UnsupportedOperationException: empty.reduceLeft

I will appreciate some guide/hints me on how can I fix the problem?

SAM
已提問 1 個月前檢視次數 159 次
2 個答案
0

Hello,

Error 'empty.reduceLeft' is a Scala error which occurs while reducing an empty collection like list, as a collection must have at least one element to perform 'reduce' operation. The error suggests that the data set on left side is empty. Please review your data and its inconsistencies for null/empty fields.

Additionally, please review the below references and inspect your code accordingly:

[1] https://www.garysieling.com/blog/fixing-scala-error-reduce-java-lang-unsupportedoperationexception-empty-reduceleft/ [2] https://stackoverflow.com/questions/6986241/is-it-valid-to-reduce-on-an-empty-set-of-sets [3] https://nrinaudo.github.io/scala-best-practices/partial_functions/traversable_reduce.html

If you would like further support in investigating this issue, please raise a case with AWS Premium Support team and provide your Glue Job Run ID.

Thanks

AWS
支援工程師
已回答 1 個月前
0

In this context, that error very likely means that is trying to get a username and password properties from the connection but one is missing (doublecheck what is the right spelling).

profile pictureAWS
專家
已回答 1 個月前
AWS
支援工程師
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南