Glue DynamicFrame show method yields nothing

0

I'm using a Notebook together with a Glue Dev Endpoint to load data from S3 into a Glue DynamicFrame. The printSchema method works fine but the show method yields nothing although the dataframe is not empty. Converting the DynamicFrame into a Spark DataFrame actually yields a result (df.toDF().show()). Here the dummy code that I'm using

glueContext = GlueContext(spark.sparkContext)

df = glueContext.create_dynamic_frame_from_options(
    connection_type="s3", 
    connection_options = {
        "paths": [f"s3://bucketname/filename"]},
    format="json",
    format_options={"multiline": True}
)

df.printSchema()
df.show()

This problem was posted on Stackoverflow before: https://stackoverflow.com/questions/56013334/spark-dynamic-frame-show-method-yields-nothing

Why does the show method yield nothing? Is this a bug or is there something that I'm missing to make this work?

morfaer
demandé il y a 2 ans3473 vues
1 réponse
0

Hello,

I would like to inform DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required. Basically Glue DynamicFrame is based on RDD due to which show() method does not work directly and you need to convert dynamic frame to dataframe first to check the data in tabular format.

dyf.printSchema()
dyf.toDF().show()

AWS
répondu il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions