No CSV extension on Glue pyspark job in Amazon S3

0

Hello,

I have a Glue Pyspark job that writes to Amazon S3 in a CSV format, I don't see any file with CSV extension on Amazon S3. How can see it ?

I am using write_dynamic_frame.from_options function in PySpark

AWS
質問済み 1年前269ビュー
3回答
0

Thanks Gonzalo, is there a way to change also naming convention of Glue for run-[timestamp]-part-r-[number} ?

paul
回答済み 1年前
0

I'd recommend looking at the logs for the job to find the root cause of the error. If the jobs did execute successfully, check to see you have the proper IAM permissions to be able to view the files in Amazon S3. But by starting with the logs, you'll be able to share a better picture of the status.

profile pictureAWS
エキスパート
pechung
回答済み 1年前
0

DynamicFrame doesn't add the extension for CSV; in big data the extension is optional because you normally work with folders.
The files you get under the destination folder with names like: run-[timestamp]-part-r-[number} are csv even if they have no extension.
If you really need the extension, you could rename after but in s3 it means copying the files, so it might be easier if you just write using DataFrame (.toDF().write.csv())

profile pictureAWS
エキスパート
回答済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ