No CSV extension on Glue pyspark job in Amazon S3

0

Hello,

I have a Glue Pyspark job that writes to Amazon S3 in a CSV format, I don't see any file with CSV extension on Amazon S3. How can see it ?

I am using write_dynamic_frame.from_options function in PySpark

AWS
asked a year ago251 views
3 Answers
0

Thanks Gonzalo, is there a way to change also naming convention of Glue for run-[timestamp]-part-r-[number} ?

paul
answered a year ago
0

I'd recommend looking at the logs for the job to find the root cause of the error. If the jobs did execute successfully, check to see you have the proper IAM permissions to be able to view the files in Amazon S3. But by starting with the logs, you'll be able to share a better picture of the status.

profile pictureAWS
EXPERT
pechung
answered a year ago
0

DynamicFrame doesn't add the extension for CSV; in big data the extension is optional because you normally work with folders.
The files you get under the destination folder with names like: run-[timestamp]-part-r-[number} are csv even if they have no extension.
If you really need the extension, you could rename after but in s3 it means copying the files, so it might be easier if you just write using DataFrame (.toDF().write.csv())

profile pictureAWS
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions