ERROR : Internal Service Exception of Glue Crawler

0

Region: us-east-1 Run ID: df56f586-c453-47d3-ae19-4ae551054221

I set up the crawler to crawl Delta tables in our S3 storage. Why it fails?

asked 7 months ago264 views
1 Answer
0

Hello, Here are some general reasons why the 'Internal Service Exception' error can be thrown when running a crawler to crawl Delta tables:

  • When crawling delta files that have the column mapping feature enabled or other features not supported in the current delta lake version. After enabling column mapping feature, the crawler uses a Delta standalone reader version that is not compatible with the current Delta protocol version for Glue. Please note that column mapping works with Delta lake 1.2.0[1] but the Glue crawler runs with Glue version 3.0 which supports Delta lake 1.0 only. Currently, the column mapping feature is not yet supported using glue crawler

  • When a crawler runs, it might encounter changes to your data store that result in a schema or partition that is different from a previous crawl. If you are configuring the crawler on the console, you can choose the following actions to prevent a table schema to change when a crawler runs:

      • Ignore the change and don't update the table in the Data Catalog 
    
      • Update all new and existing partitions with metadata from the table.
    

Please refer to this link [2] for information on how to prevent the crawler from changing an existing schema.

  • Data format. Please ensure that your folder structure is not Inconsistent, and that your data is not removed or empty data from your data source. Example Amazon S3. I suggest also to check the inconsistency between a table partition definition on the Data Catalog and a Hive partition structure in Amazon S3.
  • Column name lengths can cause internal error from the crawler. Please ensure that that does not exceed 255 characters and doesn't contain special characters.
  • Please double check the s3 path if it does not contains special characters, or encrypted data.

For more tips on resolving 'Internal Service Exception', please refer to [3].

To learn how to create crawlers to crawl Delta Lake tables, please refer to [4].

If you are still encountering issue and want to determine the exact cause, then please raise a case with AWS Premium Support. This will allow us to review the crawler configuration from our end and then we can further troubleshoot the failed run.

References:

[1] How does Delta Lake manage feature compatibility? - https://docs.delta.io/latest/versioning.html

[2] Setting crawler configuration options - How to prevent the crawler from changing an existing schema - https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html#crawler-schema-changes-prevent

[3] https://repost.aws/knowledge-center/glue-crawler-internal-service-exception

[4] https://aws.amazon.com/blogs/big-data/crawl-delta-lake-tables-using-aws-glue-crawlers/

AWS
SUPPORT ENGINEER
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions