Hello all,
I have a s3 bucket with this following path: s3://a/b/c/products
Inside the products folder I have one folder for each version (each version is a database snapshot of the products table, obtained on a weekly basis by a workflow).
- /version_0
- _temporary
- 0_$folder$
- part-00000-c5... ...c000.snappy.parquet
- /version_1
- _temporary
- 0_$folder$
- part-00000-29... ...c000.snappy.parquet
I have created a crawler (Include Path is set to the same path mentioned above -s3://a/b/c/products) with the intention of merging all the versions together into 1 table. The schemas of the different partitions are always the same. The structure of the different partitions is also always the same. I have tried with different Table Levels (4, 5 and 6) in the "Grouping Behaviour for S3 Data" section on the Crawler Settings but it always created multiple tables (one table for each version).
The _temporary folder is something automatically generated by the workflow so it seems. I don't know if I have to include this in the exclude path in order for it to work.
What should be the correct Include path, exclude path and table levels in order for me to create only ONE table merging all versions together?
I have checked all your general documentation links about this issue but could you please provide an actual solution for this issue?