Glue DynamicFrame show method yields nothing

0

I'm using a Notebook together with a Glue Dev Endpoint to load data from S3 into a Glue DynamicFrame. The printSchema method works fine but the show method yields nothing although the dataframe is not empty. Converting the DynamicFrame into a Spark DataFrame actually yields a result (df.toDF().show()). Here the dummy code that I'm using

glueContext = GlueContext(spark.sparkContext)

df = glueContext.create_dynamic_frame_from_options(
    connection_type="s3", 
    connection_options = {
        "paths": [f"s3://bucketname/filename"]},
    format="json",
    format_options={"multiline": True}
)

df.printSchema()
df.show()

This problem was posted on Stackoverflow before: https://stackoverflow.com/questions/56013334/spark-dynamic-frame-show-method-yields-nothing

Why does the show method yield nothing? Is this a bug or is there something that I'm missing to make this work?

morfaer
posta 2 anni fa3475 visualizzazioni
1 Risposta
0

Hello,

I would like to inform DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required. Basically Glue DynamicFrame is based on RDD due to which show() method does not work directly and you need to convert dynamic frame to dataframe first to check the data in tabular format.

dyf.printSchema()
dyf.toDF().show()

AWS
con risposta 2 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande