Glue DynamicFrame show method yields nothing

0

I'm using a Notebook together with a Glue Dev Endpoint to load data from S3 into a Glue DynamicFrame. The printSchema method works fine but the show method yields nothing although the dataframe is not empty. Converting the DynamicFrame into a Spark DataFrame actually yields a result (df.toDF().show()). Here the dummy code that I'm using

glueContext = GlueContext(spark.sparkContext)

df = glueContext.create_dynamic_frame_from_options(
    connection_type="s3", 
    connection_options = {
        "paths": [f"s3://bucketname/filename"]},
    format="json",
    format_options={"multiline": True}
)

df.printSchema()
df.show()

This problem was posted on Stackoverflow before: https://stackoverflow.com/questions/56013334/spark-dynamic-frame-show-method-yields-nothing

Why does the show method yield nothing? Is this a bug or is there something that I'm missing to make this work?

morfaer
질문됨 2년 전3475회 조회
1개 답변
0

Hello,

I would like to inform DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required. Basically Glue DynamicFrame is based on RDD due to which show() method does not work directly and you need to convert dynamic frame to dataframe first to check the data in tabular format.

dyf.printSchema()
dyf.toDF().show()

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠