Athena not able to read multi-line text in CSV fields

0

I've this CSV content file

"one","two","three","four","five","six","seven","eight","nine","ten"

"one","two","three","four","five " quote " five2","six","seven","eight","nine","ten"

"one","two","three","four","five \

five2","six","seven","eight","nine","ten"

  • Row 2 is normal
  • Row 3 has a field with " escaped quotes
  • Row 4 has escaped newlines

This table is not imported correctly due to newlines. Is there any other way?

I found this way but it's not correct

CREATE EXTERNAL TABLE IF NOT EXISTS `test-general-log`.`general-log` (
  `A` string,
  `B` string,
  `C` string,
  `D` string,
  `E` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (
   "input.regex" = '"*([^"]*)"*,"*([^"]*)"*'
) LOCATION 's3://test-general-log/exportedlogs/0baa57384-2c6e-4e016-85f0-bc58b31e6300/'
TBLPROPERTIES ('has_encrypted_data'='false');
已提問 2 年前檢視次數 3084 次
1 個回答
1
已接受的答案

Hi.

You can use OpenCSVSerDe to import CSV, but embedded line breaks are not supported.

Does not support embedded line breaks in CSV files.

https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html

You can keep line breaks by converting with Parquet SerDe in Glue Job as follows.

https://yohei-a.hatenablog.jp/entry/20191015/1571066002

profile picture
專家
iwasa
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南