Athena and Data Catalog: how to query json files structured as simple array of records

0

I probably missing something simple here. I have data files that crawler discovers and classifies properly:

[
  {
    "budge": 150,
    "cost": 1.44,
    "attrsales": 9.93,
    "campaignName": "camp 1"
  },
  {
  ... another record
  }
]

However, Athena wouldn't query table from Data Catalog unless the file is json objects with new line.

It prints error HIVE_BAD_DATA: Error parsing field value for field xxx, trying to parse a whole object from json array. If data files stored like this, it queries fine

{"budge": 150,"cost": 1.44,"attrsales": 9.93,"campaignName": "camp 1"}
{ another one... }

Is there a way to query files with proper json format from the Data Catalog? AWS Crawler can index and discover both. But since we're receiving data in properly formatted json, I wanted to keep it the original files.

How do you deal with this Athena feature?

preguntada hace 2 años342 visualizaciones
1 Respuesta
0

Hi,

as described in the documentation here, you could try changing the serde for the table to the** Amazon Ion Hive SerDe**.

Alternatively, you could keep the JSON file in the the source format, but use an AWS Glue job to quickly create a more compact version using Parquet or ORC data Format, which depending on the size of your data could be quite faster and more cost effective.

Hope this helps,

AWS
EXPERTO
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas