Athena and Data Catalog: how to query json files structured as simple array of records

0

I probably missing something simple here. I have data files that crawler discovers and classifies properly:

[
  {
    "budge": 150,
    "cost": 1.44,
    "attrsales": 9.93,
    "campaignName": "camp 1"
  },
  {
  ... another record
  }
]

However, Athena wouldn't query table from Data Catalog unless the file is json objects with new line.

It prints error HIVE_BAD_DATA: Error parsing field value for field xxx, trying to parse a whole object from json array. If data files stored like this, it queries fine

{"budge": 150,"cost": 1.44,"attrsales": 9.93,"campaignName": "camp 1"}
{ another one... }

Is there a way to query files with proper json format from the Data Catalog? AWS Crawler can index and discover both. But since we're receiving data in properly formatted json, I wanted to keep it the original files.

How do you deal with this Athena feature?

asked 2 years ago334 views
1 Answer
0

Hi,

as described in the documentation here, you could try changing the serde for the table to the** Amazon Ion Hive SerDe**.

Alternatively, you could keep the JSON file in the the source format, but use an AWS Glue job to quickly create a more compact version using Parquet or ORC data Format, which depending on the size of your data could be quite faster and more cost effective.

Hope this helps,

AWS
EXPERT
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions