Using Pandas in Glue ETL Job ( How to convert Dynamic DataFrame or PySpark Dataframe to Pandas Dataframe)

0

I am wanting to use Pandas in a Glue ETL job. I am reading from S3 and writing to Data Catalog. I am trying to find a basic example where I can read in from S3 , either into or converting to a Pandas DF, and then do my manipulations and then write out to Data Catalog. It looks like I may need to write to a Dynamic DataFrame before sending to data catalog. Any examples? I am doing my ETL today using PySpark but would like to do most of my transformations in Pandas.

bfeeny
feita há 2 anos9833 visualizações
1 Resposta
0
Resposta aceita

Would say convert Dynamic frame to Spark data frame using .ToDF() method and from spark dataframe to pandas dataframe using link https://sparkbyexamples.com/pyspark/convert-pyspark-dataframe-to-pandas/#:~:text=Convert%20PySpark%20Dataframe%20to%20Pandas%20DataFrame,small%20subset%20of%20the%20data.

AWS
NishAWS
respondido há 2 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas