HOW DO I COMBINE MULTIPLE CSVS INTO ONE

0

I HAVE MULTIPLE CSVS ABOUT A SINGLE PATIENT AND I WOULD LIKE TO KNOW HOW DO I COMBINE ALL THE CSVS BECAUSE ALL THE COLUMNS INSIDE THE CSVS MAKE UP AN ALL THE INFORMATION FOR ONE PATIENT. THE CSV'S ARE STORED IN S3 BUCKET AND INDIFFERENT FOLDERS. i HAVE TRIED USING JOIN BUT BECAUSE WE HAVE MANY PATIENTS THE JOB IS TAKING FOREVER.TIA

CYN
질문됨 7달 전401회 조회
1개 답변
3

Hello,

You can create an athena table for taking the input locations as all the s3 prefix. Something like this, refer create table in athena

CREATE EXTERNAL TABLE `test_table`(
...
)
ROW FORMAT ...
STORED AS INPUTFORMAT ...
OUTPUTFORMAT ...
LOCATION s3://bucketname/folder/

Once create the table, use CTAS to create another table to consolidate all the csv as single table output location like below, refer here for CTAS

CREATE TABLE ctas_csv_unpartitioned 
WITH (
     format = 'CSV', 
     external_location = 's3://xxxxxxxxxxxx/ctas_csv_unpartitioned/') 
AS SELECT key1, name1, comment1
FROM test_table;
AWS
지원 엔지니어
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠