Athena not able to read multi-line text in CSV fields

0

I've this CSV content file

"one","two","three","four","five","six","seven","eight","nine","ten"

"one","two","three","four","five " quote " five2","six","seven","eight","nine","ten"

"one","two","three","four","five \

five2","six","seven","eight","nine","ten"

  • Row 2 is normal
  • Row 3 has a field with " escaped quotes
  • Row 4 has escaped newlines

This table is not imported correctly due to newlines. Is there any other way?

I found this way but it's not correct

CREATE EXTERNAL TABLE IF NOT EXISTS `test-general-log`.`general-log` (
  `A` string,
  `B` string,
  `C` string,
  `D` string,
  `E` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (
   "input.regex" = '"*([^"]*)"*,"*([^"]*)"*'
) LOCATION 's3://test-general-log/exportedlogs/0baa57384-2c6e-4e016-85f0-bc58b31e6300/'
TBLPROPERTIES ('has_encrypted_data'='false');
質問済み 2年前3084ビュー
1回答
1
承認された回答

Hi.

You can use OpenCSVSerDe to import CSV, but embedded line breaks are not supported.

Does not support embedded line breaks in CSV files.

https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html

You can keep line breaks by converting with Parquet SerDe in Glue Job as follows.

https://yohei-a.hatenablog.jp/entry/20191015/1571066002

profile picture
エキスパート
iwasa
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ