No CSV extension on Glue pyspark job in Amazon S3

0

Hello,

I have a Glue Pyspark job that writes to Amazon S3 in a CSV format, I don't see any file with CSV extension on Amazon S3. How can see it ?

I am using write_dynamic_frame.from_options function in PySpark

AWS
질문됨 일 년 전270회 조회
3개 답변
0

Thanks Gonzalo, is there a way to change also naming convention of Glue for run-[timestamp]-part-r-[number} ?

paul
답변함 일 년 전
0

I'd recommend looking at the logs for the job to find the root cause of the error. If the jobs did execute successfully, check to see you have the proper IAM permissions to be able to view the files in Amazon S3. But by starting with the logs, you'll be able to share a better picture of the status.

profile pictureAWS
전문가
pechung
답변함 일 년 전
0

DynamicFrame doesn't add the extension for CSV; in big data the extension is optional because you normally work with folders.
The files you get under the destination folder with names like: run-[timestamp]-part-r-[number} are csv even if they have no extension.
If you really need the extension, you could rename after but in s3 it means copying the files, so it might be easier if you just write using DataFrame (.toDF().write.csv())

profile pictureAWS
전문가
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인