No CSV extension on Glue pyspark job in Amazon S3

0

Hello,

I have a Glue Pyspark job that writes to Amazon S3 in a CSV format, I don't see any file with CSV extension on Amazon S3. How can see it ?

I am using write_dynamic_frame.from_options function in PySpark

AWS
feita há um ano269 visualizações
3 Respostas
0

Thanks Gonzalo, is there a way to change also naming convention of Glue for run-[timestamp]-part-r-[number} ?

paul
respondido há um ano
0

I'd recommend looking at the logs for the job to find the root cause of the error. If the jobs did execute successfully, check to see you have the proper IAM permissions to be able to view the files in Amazon S3. But by starting with the logs, you'll be able to share a better picture of the status.

profile pictureAWS
ESPECIALISTA
pechung
respondido há um ano
0

DynamicFrame doesn't add the extension for CSV; in big data the extension is optional because you normally work with folders.
The files you get under the destination folder with names like: run-[timestamp]-part-r-[number} are csv even if they have no extension.
If you really need the extension, you could rename after but in s3 it means copying the files, so it might be easier if you just write using DataFrame (.toDF().write.csv())

profile pictureAWS
ESPECIALISTA
respondido há um ano

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas