AWS Glue crawler exclude patterns not working

0

I am new to AWS and Glue and I am using AWS glue crawler to get some files under the path bucket/basefolder. Here is the folder structure:

bucket/basefolder
    subfolder1
        logfolder
            log1.json
        file1.parquet
    subfolder2
        logfolder
            log2.json
        file2.parquet
        file3.parquet

I want to get files under the base folder and subfolders and exclude all the files under the logfolder The Exclude patterns in the crawler setting:

logfolder/**
logfolder**
logfolder/*
*.json

But the crawler still get all the json files under the logfolder, none of the exclude pattern works. Please help.

preguntada hace 2 años3419 visualizaciones
1 Respuesta
1
Respuesta aceptada

Hello,

I've tested a crawler using the same folder structure in S3 as mentioned.

Specified include path as: s3://bucket/basefolder/

Exclude pattern as: **/logfolder/**

Using above exclude pattern ignores all files under folders named 'logfolder'. For your reference - https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

AWS
respondido hace 2 años
  • Thanks that works, I was following the same link but the information is a little bit misleading I think. Now I know I should also put **/ before the folder.

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas