AWS Glue Job Error : An error occurred while calling o115.getDynamicFrame.: java.lang.UnsupportedOperationException: empty.reduceLeft

0

Hello Friends, I am working on a project that reads data from Dremio data lakehouse solution, I am trying to read the data from one of its schema. Glue does not natively come with the connector, so I had to build a custom jdbc.

See my code base

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkConf, SparkContext
from pyspark.sql import SparkSession
from awsglue.context import GlueContext
from awsglue.job import Job
conf = SparkConf()

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")\
    .set("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")\
    .set("spark.sql.catalog.glue_catalog.warehouse", "s3://dev-smt-data-cache/stageZone_iceberg/iceber_repo/smt-data/")\
    .set("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")\
    .set("spark.sql.catalog.glue_catalog.io-impl","org.apache.iceberg.aws.s3.S3FileIO")\
    .set("--datalake-formats","iceberg")

sc = SparkContext(conf=conf)
glueContext = GlueContext(sc)

# below spark session will have the above configuration
spark = glueContext.spark_session   
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

DynamicFrame = glueContext.create_dynamic_frame.from_options(
        connection_type = "jdbc", 
        connection_options = {
        "query":""" 'SELECT FileID FROM "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" LIMIT 10' """,
        "inferSchema":True,
        # "dbtable": """ "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" """,
        "connectionName":"Dremio-Stage"}
         #"transformation_ctx" = "DynamicFrame"
         )

applyformat = ApplyMapping.apply(
    frame =DynamicFrame, 
    mappings =
        [("field1","string","field1","string")
        #("field2","string","field2","string") 
        ], 
    transformation_ctx = "applyformat"
    )      
dynamicFrame = DynamicFrame.toDF().createOrReplaceTempView("temp_table")
print(dynamicFrame.head(5))

I keep getting this error before , I have different approaches none working out

   raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o115.getDynamicFrame.
: java.lang.UnsupportedOperationException: empty.reduceLeft

I will appreciate some guide/hints me on how can I fix the problem?

SAM
질문됨 한 달 전158회 조회
2개 답변
0

Hello,

Error 'empty.reduceLeft' is a Scala error which occurs while reducing an empty collection like list, as a collection must have at least one element to perform 'reduce' operation. The error suggests that the data set on left side is empty. Please review your data and its inconsistencies for null/empty fields.

Additionally, please review the below references and inspect your code accordingly:

[1] https://www.garysieling.com/blog/fixing-scala-error-reduce-java-lang-unsupportedoperationexception-empty-reduceleft/ [2] https://stackoverflow.com/questions/6986241/is-it-valid-to-reduce-on-an-empty-set-of-sets [3] https://nrinaudo.github.io/scala-best-practices/partial_functions/traversable_reduce.html

If you would like further support in investigating this issue, please raise a case with AWS Premium Support team and provide your Glue Job Run ID.

Thanks

AWS
지원 엔지니어
답변함 한 달 전
0

In this context, that error very likely means that is trying to get a username and password properties from the connection but one is missing (doublecheck what is the right spelling).

profile pictureAWS
전문가
답변함 한 달 전
AWS
지원 엔지니어
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠