Glue DynamicFrame show method yields nothing

0

I'm using a Notebook together with a Glue Dev Endpoint to load data from S3 into a Glue DynamicFrame. The printSchema method works fine but the show method yields nothing although the dataframe is not empty. Converting the DynamicFrame into a Spark DataFrame actually yields a result (df.toDF().show()). Here the dummy code that I'm using

glueContext = GlueContext(spark.sparkContext)

df = glueContext.create_dynamic_frame_from_options(
    connection_type="s3", 
    connection_options = {
        "paths": [f"s3://bucketname/filename"]},
    format="json",
    format_options={"multiline": True}
)

df.printSchema()
df.show()

This problem was posted on Stackoverflow before: https://stackoverflow.com/questions/56013334/spark-dynamic-frame-show-method-yields-nothing

Why does the show method yield nothing? Is this a bug or is there something that I'm missing to make this work?

morfaer
gefragt vor 2 Jahren3478 Aufrufe
1 Antwort
0

Hello,

I would like to inform DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required. Basically Glue DynamicFrame is based on RDD due to which show() method does not work directly and you need to convert dynamic frame to dataframe first to check the data in tabular format.

dyf.printSchema()
dyf.toDF().show()

AWS
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen