Athena not able to read multi-line text in CSV fields

0

I've this CSV content file

"one","two","three","four","five","six","seven","eight","nine","ten"

"one","two","three","four","five " quote " five2","six","seven","eight","nine","ten"

"one","two","three","four","five \

five2","six","seven","eight","nine","ten"

  • Row 2 is normal
  • Row 3 has a field with " escaped quotes
  • Row 4 has escaped newlines

This table is not imported correctly due to newlines. Is there any other way?

I found this way but it's not correct

CREATE EXTERNAL TABLE IF NOT EXISTS `test-general-log`.`general-log` (
  `A` string,
  `B` string,
  `C` string,
  `D` string,
  `E` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (
   "input.regex" = '"*([^"]*)"*,"*([^"]*)"*'
) LOCATION 's3://test-general-log/exportedlogs/0baa57384-2c6e-4e016-85f0-bc58b31e6300/'
TBLPROPERTIES ('has_encrypted_data'='false');
질문됨 2년 전3084회 조회
1개 답변
1
수락된 답변

Hi.

You can use OpenCSVSerDe to import CSV, but embedded line breaks are not supported.

Does not support embedded line breaks in CSV files.

https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html

You can keep line breaks by converting with Parquet SerDe in Glue Job as follows.

https://yohei-a.hatenablog.jp/entry/20191015/1571066002

profile picture
전문가
iwasa
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인