Athena not able to read multi-line text in CSV fields

0

I've this CSV content file

"one","two","three","four","five","six","seven","eight","nine","ten"

"one","two","three","four","five " quote " five2","six","seven","eight","nine","ten"

"one","two","three","four","five \

five2","six","seven","eight","nine","ten"

  • Row 2 is normal
  • Row 3 has a field with " escaped quotes
  • Row 4 has escaped newlines

This table is not imported correctly due to newlines. Is there any other way?

I found this way but it's not correct

CREATE EXTERNAL TABLE IF NOT EXISTS `test-general-log`.`general-log` (
  `A` string,
  `B` string,
  `C` string,
  `D` string,
  `E` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (
   "input.regex" = '"*([^"]*)"*,"*([^"]*)"*'
) LOCATION 's3://test-general-log/exportedlogs/0baa57384-2c6e-4e016-85f0-bc58b31e6300/'
TBLPROPERTIES ('has_encrypted_data'='false');
gefragt vor 2 Jahren3084 Aufrufe
1 Antwort
1
Akzeptierte Antwort

Hi.

You can use OpenCSVSerDe to import CSV, but embedded line breaks are not supported.

Does not support embedded line breaks in CSV files.

https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html

You can keep line breaks by converting with Parquet SerDe in Glue Job as follows.

https://yohei-a.hatenablog.jp/entry/20191015/1571066002

profile picture
EXPERTE
iwasa
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen