AWS Glue crawler exclude patterns not working

0

I am new to AWS and Glue and I am using AWS glue crawler to get some files under the path bucket/basefolder. Here is the folder structure:

bucket/basefolder
    subfolder1
        logfolder
            log1.json
        file1.parquet
    subfolder2
        logfolder
            log2.json
        file2.parquet
        file3.parquet

I want to get files under the base folder and subfolders and exclude all the files under the logfolder The Exclude patterns in the crawler setting:

logfolder/**
logfolder**
logfolder/*
*.json

But the crawler still get all the json files under the logfolder, none of the exclude pattern works. Please help.

질문됨 2년 전3419회 조회
1개 답변
1
수락된 답변

Hello,

I've tested a crawler using the same folder structure in S3 as mentioned.

Specified include path as: s3://bucket/basefolder/

Exclude pattern as: **/logfolder/**

Using above exclude pattern ignores all files under folders named 'logfolder'. For your reference - https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

AWS
답변함 2년 전
  • Thanks that works, I was following the same link but the information is a little bit misleading I think. Now I know I should also put **/ before the folder.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠