Athena and Data Catalog: how to query json files structured as simple array of records

0

I probably missing something simple here. I have data files that crawler discovers and classifies properly:

[
  {
    "budge": 150,
    "cost": 1.44,
    "attrsales": 9.93,
    "campaignName": "camp 1"
  },
  {
  ... another record
  }
]

However, Athena wouldn't query table from Data Catalog unless the file is json objects with new line.

It prints error HIVE_BAD_DATA: Error parsing field value for field xxx, trying to parse a whole object from json array. If data files stored like this, it queries fine

{"budge": 150,"cost": 1.44,"attrsales": 9.93,"campaignName": "camp 1"}
{ another one... }

Is there a way to query files with proper json format from the Data Catalog? AWS Crawler can index and discover both. But since we're receiving data in properly formatted json, I wanted to keep it the original files.

How do you deal with this Athena feature?

質問済み 2年前342ビュー
1回答
0

Hi,

as described in the documentation here, you could try changing the serde for the table to the** Amazon Ion Hive SerDe**.

Alternatively, you could keep the JSON file in the the source format, but use an AWS Glue job to quickly create a more compact version using Parquet or ORC data Format, which depending on the size of your data could be quite faster and more cost effective.

Hope this helps,

AWS
エキスパート
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ