Using Pandas in Glue ETL Job ( How to convert Dynamic DataFrame or PySpark Dataframe to Pandas Dataframe)

0

I am wanting to use Pandas in a Glue ETL job. I am reading from S3 and writing to Data Catalog. I am trying to find a basic example where I can read in from S3 , either into or converting to a Pandas DF, and then do my manipulations and then write out to Data Catalog. It looks like I may need to write to a Dynamic DataFrame before sending to data catalog. Any examples? I am doing my ETL today using PySpark but would like to do most of my transformations in Pandas.

bfeeny
已提問 2 年前檢視次數 9833 次
1 個回答
0
已接受的答案

Would say convert Dynamic frame to Spark data frame using .ToDF() method and from spark dataframe to pandas dataframe using link https://sparkbyexamples.com/pyspark/convert-pyspark-dataframe-to-pandas/#:~:text=Convert%20PySpark%20Dataframe%20to%20Pandas%20DataFrame,small%20subset%20of%20the%20data.

AWS
NishAWS
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南