No CSV extension on Glue pyspark job in Amazon S3

0

Hello,

I have a Glue Pyspark job that writes to Amazon S3 in a CSV format, I don't see any file with CSV extension on Amazon S3. How can see it ?

I am using write_dynamic_frame.from_options function in PySpark

AWS
posta un anno fa269 visualizzazioni
3 Risposte
0

Thanks Gonzalo, is there a way to change also naming convention of Glue for run-[timestamp]-part-r-[number} ?

paul
con risposta un anno fa
0

I'd recommend looking at the logs for the job to find the root cause of the error. If the jobs did execute successfully, check to see you have the proper IAM permissions to be able to view the files in Amazon S3. But by starting with the logs, you'll be able to share a better picture of the status.

profile pictureAWS
ESPERTO
pechung
con risposta un anno fa
0

DynamicFrame doesn't add the extension for CSV; in big data the extension is optional because you normally work with folders.
The files you get under the destination folder with names like: run-[timestamp]-part-r-[number} are csv even if they have no extension.
If you really need the extension, you could rename after but in s3 it means copying the files, so it might be easier if you just write using DataFrame (.toDF().write.csv())

profile pictureAWS
ESPERTO
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande