Athena and Data Catalog: how to query json files structured as simple array of records

0

I probably missing something simple here. I have data files that crawler discovers and classifies properly:

[
  {
    "budge": 150,
    "cost": 1.44,
    "attrsales": 9.93,
    "campaignName": "camp 1"
  },
  {
  ... another record
  }
]

However, Athena wouldn't query table from Data Catalog unless the file is json objects with new line.

It prints error HIVE_BAD_DATA: Error parsing field value for field xxx, trying to parse a whole object from json array. If data files stored like this, it queries fine

{"budge": 150,"cost": 1.44,"attrsales": 9.93,"campaignName": "camp 1"}
{ another one... }

Is there a way to query files with proper json format from the Data Catalog? AWS Crawler can index and discover both. But since we're receiving data in properly formatted json, I wanted to keep it the original files.

How do you deal with this Athena feature?

已提問 2 年前檢視次數 342 次
1 個回答
0

Hi,

as described in the documentation here, you could try changing the serde for the table to the** Amazon Ion Hive SerDe**.

Alternatively, you could keep the JSON file in the the source format, but use an AWS Glue job to quickly create a more compact version using Parquet or ORC data Format, which depending on the size of your data could be quite faster and more cost effective.

Hope this helps,

AWS
專家
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南