Athena and Data Catalog: how to query json files structured as simple array of records

0

I probably missing something simple here. I have data files that crawler discovers and classifies properly:

[
  {
    "budge": 150,
    "cost": 1.44,
    "attrsales": 9.93,
    "campaignName": "camp 1"
  },
  {
  ... another record
  }
]

However, Athena wouldn't query table from Data Catalog unless the file is json objects with new line.

It prints error HIVE_BAD_DATA: Error parsing field value for field xxx, trying to parse a whole object from json array. If data files stored like this, it queries fine

{"budge": 150,"cost": 1.44,"attrsales": 9.93,"campaignName": "camp 1"}
{ another one... }

Is there a way to query files with proper json format from the Data Catalog? AWS Crawler can index and discover both. But since we're receiving data in properly formatted json, I wanted to keep it the original files.

How do you deal with this Athena feature?

질문됨 2년 전342회 조회
1개 답변
0

Hi,

as described in the documentation here, you could try changing the serde for the table to the** Amazon Ion Hive SerDe**.

Alternatively, you could keep the JSON file in the the source format, but use an AWS Glue job to quickly create a more compact version using Parquet or ORC data Format, which depending on the size of your data could be quite faster and more cost effective.

Hope this helps,

AWS
전문가
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠