My AWS Glue crawler doesn't add new partitions to the table.
Short description
AWS Glue considers a difference in the name, sequence, or number of partitions in the Amazon Simple Storage Service (Amazon S3) path as a change in the partition schema or structure. Crawlers scan the source data files under a new partition and compare the following attributes of the source files with those of the existing table:
- File format
- Compression type
- Schema
- Structure of Amazon S3 partitions
If any of these partition attributes differ from the table attributes, then AWS Glue skips the partition and doesn't update the metadata.
Resolution
Troubleshoot the issue
To check the crawler logs to identify the issue, complete the following steps:
- Open the AWS Glue console.
- In the navigation pane, choose Crawlers.
- Select the crawler, and then choose Logs to view the logs on the Amazon CloudWatch console.
- Review the logs to check if the crawler skipped the new partition.
For example, suppose that the log includes entries look similar to the following:
Folder partition keys do not match table partition keys, skipped folder: doc-example-bucket/doc-example-path/doc-example-table/year=2021/month=01/sday=05/
This entry suggests that the partition structure for the Amazon S3 location doesn't match the partition keys defined for the table. This difference might happen when the partition structure isn't consistent across the table source location.
If the AWS Glue crawler creates multiple tables, then the log entries look similar to the following:
INFO : Created table doc-example-table in database doxtest_db
If you see similar logs, then compare the schema and partition structure of the location of these tables with those of the original table.
Resolve the issue
Update the partition structure
If the issue is caused by inconsistent partition structure, then manually or programmatically rename the structure to be consistent.
Exclude specific files
AWS Glue might skip a partition for one of the following reasons:
- Mismatched file formats
- Compression formats
- Schema issues
If AWS Glue skips a partition for one of these reasons and the data isn't needed in the table, then complete one of the following actions:
Combine compatible schemas
If your data has similar schemas in different input files, then combine compatible schemas when you create the crawler. On the Configure the crawler's output page, under Grouping behavior for S3 data (optional), select Create a single schema for each S3 path. When you turn on this setting and the data is compatible, the crawler ignores the similarity of specific schemas for S3 objects in the specified path. For more information, see Creating a single schema for each Amazon S3 include path.
Prevent multiple tables
If the crawler is creating multiple tables, then see How do I prevent the creation of multiple tables during an AWS Glue crawler run?