AWS Glue crawler exclude patterns not working

0

I am new to AWS and Glue and I am using AWS glue crawler to get some files under the path bucket/basefolder. Here is the folder structure:

bucket/basefolder
    subfolder1
        logfolder
            log1.json
        file1.parquet
    subfolder2
        logfolder
            log2.json
        file2.parquet
        file3.parquet

I want to get files under the base folder and subfolders and exclude all the files under the logfolder The Exclude patterns in the crawler setting:

logfolder/**
logfolder**
logfolder/*
*.json

But the crawler still get all the json files under the logfolder, none of the exclude pattern works. Please help.

gefragt vor 2 Jahren3419 Aufrufe
1 Antwort
1
Akzeptierte Antwort

Hello,

I've tested a crawler using the same folder structure in S3 as mentioned.

Specified include path as: s3://bucket/basefolder/

Exclude pattern as: **/logfolder/**

Using above exclude pattern ignores all files under folders named 'logfolder'. For your reference - https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

AWS
beantwortet vor 2 Jahren
  • Thanks that works, I was following the same link but the information is a little bit misleading I think. Now I know I should also put **/ before the folder.

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen