Glue Crawler cannot classify SNAPPY compressed JSON files

0

I have a KFH application that puts compressed json files as snappy into an S3 bucket. I have also a Glue Crawler that creates schema using that bucket. However, the crawler classifies the table as UNKNOWN in case I activate snappy compression. It cannot detect the file is in JSON format indeed. According to below doc, Glue crawler provides snappy compression with JSON files but I wasn't able to achieve it. https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-built-in

I have also thought it might be related to the file extension and tried below names but it didn't work:

Original:

|-----s3://my-bucket/my-table/day=01/file1.snappy

(1)

|-----s3://my-bucket/my-table/day=01/file1.snappy.json 

(2)

|-----s3://my-bucket/my-table/day=01/file1.json.snappy 

Thanks.

1回答
0

Glue crawler is unable to read it, you could create a custom JSON Classifier. After creating it, attach the custom classifier to the crawler, and this should enable the crawler to read it correctly, changing its status from Unknown to the name of your custom classifier.

Example Below:
 
    {
      "type": "constituency",
      "id": "ocd-division\/country:us\/state:ak",
      "name": "Alaska"
    }
 
Please refer to the following documentation on adding a custom classifier:
https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html
https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-built-in
AWS
回答済み 10ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ