Using Pandas in Glue ETL Job ( How to convert Dynamic DataFrame or PySpark Dataframe to Pandas Dataframe)

0

I am wanting to use Pandas in a Glue ETL job. I am reading from S3 and writing to Data Catalog. I am trying to find a basic example where I can read in from S3 , either into or converting to a Pandas DF, and then do my manipulations and then write out to Data Catalog. It looks like I may need to write to a Dynamic DataFrame before sending to data catalog. Any examples? I am doing my ETL today using PySpark but would like to do most of my transformations in Pandas.

bfeeny
preguntada hace 2 años9833 visualizaciones
1 Respuesta
0
Respuesta aceptada

Would say convert Dynamic frame to Spark data frame using .ToDF() method and from spark dataframe to pandas dataframe using link https://sparkbyexamples.com/pyspark/convert-pyspark-dataframe-to-pandas/#:~:text=Convert%20PySpark%20Dataframe%20to%20Pandas%20DataFrame,small%20subset%20of%20the%20data.

AWS
NishAWS
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas