Athena not able to read multi-line text in CSV fields

0

I've this CSV content file

"one","two","three","four","five","six","seven","eight","nine","ten"

"one","two","three","four","five " quote " five2","six","seven","eight","nine","ten"

"one","two","three","four","five \

five2","six","seven","eight","nine","ten"

  • Row 2 is normal
  • Row 3 has a field with " escaped quotes
  • Row 4 has escaped newlines

This table is not imported correctly due to newlines. Is there any other way?

I found this way but it's not correct

CREATE EXTERNAL TABLE IF NOT EXISTS `test-general-log`.`general-log` (
  `A` string,
  `B` string,
  `C` string,
  `D` string,
  `E` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (
   "input.regex" = '"*([^"]*)"*,"*([^"]*)"*'
) LOCATION 's3://test-general-log/exportedlogs/0baa57384-2c6e-4e016-85f0-bc58b31e6300/'
TBLPROPERTIES ('has_encrypted_data'='false');
feita há 2 anos3084 visualizações
1 Resposta
1
Resposta aceita

Hi.

You can use OpenCSVSerDe to import CSV, but embedded line breaks are not supported.

Does not support embedded line breaks in CSV files.

https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html

You can keep line breaks by converting with Parquet SerDe in Glue Job as follows.

https://yohei-a.hatenablog.jp/entry/20191015/1571066002

profile picture
ESPECIALISTA
iwasa
respondido há 2 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas