AWS Glue Job Error : An error occurred while calling o115.getDynamicFrame.: java.lang.UnsupportedOperationException: empty.reduceLeft

0

Hello Friends, I am working on a project that reads data from Dremio data lakehouse solution, I am trying to read the data from one of its schema. Glue does not natively come with the connector, so I had to build a custom jdbc.

See my code base

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkConf, SparkContext
from pyspark.sql import SparkSession
from awsglue.context import GlueContext
from awsglue.job import Job
conf = SparkConf()

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")\
    .set("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")\
    .set("spark.sql.catalog.glue_catalog.warehouse", "s3://dev-smt-data-cache/stageZone_iceberg/iceber_repo/smt-data/")\
    .set("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")\
    .set("spark.sql.catalog.glue_catalog.io-impl","org.apache.iceberg.aws.s3.S3FileIO")\
    .set("--datalake-formats","iceberg")

sc = SparkContext(conf=conf)
glueContext = GlueContext(sc)

# below spark session will have the above configuration
spark = glueContext.spark_session   
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

DynamicFrame = glueContext.create_dynamic_frame.from_options(
        connection_type = "jdbc", 
        connection_options = {
        "query":""" 'SELECT FileID FROM "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" LIMIT 10' """,
        "inferSchema":True,
        # "dbtable": """ "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" """,
        "connectionName":"Dremio-Stage"}
         #"transformation_ctx" = "DynamicFrame"
         )

applyformat = ApplyMapping.apply(
    frame =DynamicFrame, 
    mappings =
        [("field1","string","field1","string")
        #("field2","string","field2","string") 
        ], 
    transformation_ctx = "applyformat"
    )      
dynamicFrame = DynamicFrame.toDF().createOrReplaceTempView("temp_table")
print(dynamicFrame.head(5))

I keep getting this error before , I have different approaches none working out

   raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o115.getDynamicFrame.
: java.lang.UnsupportedOperationException: empty.reduceLeft

I will appreciate some guide/hints me on how can I fix the problem?

SAM
preguntada hace un mes159 visualizaciones
2 Respuestas
0

Hello,

Error 'empty.reduceLeft' is a Scala error which occurs while reducing an empty collection like list, as a collection must have at least one element to perform 'reduce' operation. The error suggests that the data set on left side is empty. Please review your data and its inconsistencies for null/empty fields.

Additionally, please review the below references and inspect your code accordingly:

[1] https://www.garysieling.com/blog/fixing-scala-error-reduce-java-lang-unsupportedoperationexception-empty-reduceleft/ [2] https://stackoverflow.com/questions/6986241/is-it-valid-to-reduce-on-an-empty-set-of-sets [3] https://nrinaudo.github.io/scala-best-practices/partial_functions/traversable_reduce.html

If you would like further support in investigating this issue, please raise a case with AWS Premium Support team and provide your Glue Job Run ID.

Thanks

AWS
INGENIERO DE SOPORTE
respondido hace un mes
0

In this context, that error very likely means that is trying to get a username and password properties from the connection but one is missing (doublecheck what is the right spelling).

profile pictureAWS
EXPERTO
respondido hace un mes
AWS
INGENIERO DE SOPORTE
revisado hace un mes

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas