HOW DO I COMBINE MULTIPLE CSVS INTO ONE

0

I HAVE MULTIPLE CSVS ABOUT A SINGLE PATIENT AND I WOULD LIKE TO KNOW HOW DO I COMBINE ALL THE CSVS BECAUSE ALL THE COLUMNS INSIDE THE CSVS MAKE UP AN ALL THE INFORMATION FOR ONE PATIENT. THE CSV'S ARE STORED IN S3 BUCKET AND INDIFFERENT FOLDERS. i HAVE TRIED USING JOIN BUT BECAUSE WE HAVE MANY PATIENTS THE JOB IS TAKING FOREVER.TIA

CYN
已提问 7 个月前403 查看次数
1 回答
3

Hello,

You can create an athena table for taking the input locations as all the s3 prefix. Something like this, refer create table in athena

CREATE EXTERNAL TABLE `test_table`(
...
)
ROW FORMAT ...
STORED AS INPUTFORMAT ...
OUTPUTFORMAT ...
LOCATION s3://bucketname/folder/

Once create the table, use CTAS to create another table to consolidate all the csv as single table output location like below, refer here for CTAS

CREATE TABLE ctas_csv_unpartitioned 
WITH (
     format = 'CSV', 
     external_location = 's3://xxxxxxxxxxxx/ctas_csv_unpartitioned/') 
AS SELECT key1, name1, comment1
FROM test_table;
AWS
支持工程师
已回答 7 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则