AWS Glue crawler exclude patterns not working

0

I am new to AWS and Glue and I am using AWS glue crawler to get some files under the path bucket/basefolder. Here is the folder structure:

bucket/basefolder
    subfolder1
        logfolder
            log1.json
        file1.parquet
    subfolder2
        logfolder
            log2.json
        file2.parquet
        file3.parquet

I want to get files under the base folder and subfolders and exclude all the files under the logfolder The Exclude patterns in the crawler setting:

logfolder/**
logfolder**
logfolder/*
*.json

But the crawler still get all the json files under the logfolder, none of the exclude pattern works. Please help.

質問済み 2年前3421ビュー
1回答
1
承認された回答

Hello,

I've tested a crawler using the same folder structure in S3 as mentioned.

Specified include path as: s3://bucket/basefolder/

Exclude pattern as: **/logfolder/**

Using above exclude pattern ignores all files under folders named 'logfolder'. For your reference - https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

AWS
回答済み 2年前
  • Thanks that works, I was following the same link but the information is a little bit misleading I think. Now I know I should also put **/ before the folder.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン