AWS Glue crawler exclude patterns not working

0

I am new to AWS and Glue and I am using AWS glue crawler to get some files under the path bucket/basefolder. Here is the folder structure:

bucket/basefolder
    subfolder1
        logfolder
            log1.json
        file1.parquet
    subfolder2
        logfolder
            log2.json
        file2.parquet
        file3.parquet

I want to get files under the base folder and subfolders and exclude all the files under the logfolder The Exclude patterns in the crawler setting:

logfolder/**
logfolder**
logfolder/*
*.json

But the crawler still get all the json files under the logfolder, none of the exclude pattern works. Please help.

demandé il y a 2 ans3421 vues
1 réponse
1
Réponse acceptée

Hello,

I've tested a crawler using the same folder structure in S3 as mentioned.

Specified include path as: s3://bucket/basefolder/

Exclude pattern as: **/logfolder/**

Using above exclude pattern ignores all files under folders named 'logfolder'. For your reference - https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

AWS
répondu il y a 2 ans
  • Thanks that works, I was following the same link but the information is a little bit misleading I think. Now I know I should also put **/ before the folder.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions