Skip to content

Why doesn’t my AWS Glue crawler add new partitions to the table?

3 minute read
1

My AWS Glue crawler doesn't add new partitions to the table.

Short description

AWS Glue considers a difference in the name, sequence, or number of partitions in the Amazon Simple Storage Service (Amazon S3) path as a change in the partition schema or structure. Crawlers scan the source data files under a new partition and compare the following attributes of the source files with those of the existing table:

  • File format
  • Compression type
  • Schema
  • Structure of Amazon S3 partitions

If any of these partition attributes differ from the table attributes, then AWS Glue skips the partition and doesn't update the metadata.

Resolution

Troubleshoot the issue

To check the crawler logs to identify the issue, complete the following steps:

  1. Open the AWS Glue console.
  2. In the navigation pane, choose Crawlers.
  3. Select the crawler, and then choose Logs to view the logs on the Amazon CloudWatch console.
  4. Review the logs to check if the crawler skipped the new partition.

For example, suppose that the log includes entries look similar to the following:

Folder partition keys do not match table partition keys, skipped folder: doc-example-bucket/doc-example-path/doc-example-table/year=2021/month=01/sday=05/

This entry suggests that the partition structure for the Amazon S3 location doesn't match the partition keys defined for the table. This difference might happen when the partition structure isn't consistent across the table source location.

If the AWS Glue crawler creates multiple tables, then the log entries look similar to the following:

INFO : Created table doc-example-table in database doxtest_db

If you see similar logs, then compare the schema and partition structure of the location of these tables with those of the original table.

Resolve the issue

Update the partition structure

If the issue is caused by inconsistent partition structure, then manually or programmatically rename the structure to be consistent.

Exclude specific files

AWS Glue might skip a partition for one of the following reasons:

  • Mismatched file formats
  • Compression formats
  • Schema issues

If AWS Glue skips a partition for one of these reasons and the data isn't needed in the table, then complete one of the following actions:

Combine compatible schemas

If your data has similar schemas in different input files, then combine compatible schemas when you create the crawler. On the Configure the crawler's output page, under Grouping behavior for S3 data (optional), select Create a single schema for each S3 path. When you turn on this setting and the data is compatible, the crawler ignores the similarity of specific schemas for S3 objects in the specified path. For more information, see Creating a single schema for each Amazon S3 include path.

Prevent multiple tables

If the crawler is creating multiple tables, then see How do I prevent the creation of multiple tables during an AWS Glue crawler run?

AWS OFFICIALUpdated a year ago