HIVE_INVALID_METADATA error on Athena

0

I previously had csv files with some columns and data and I replaced one column with two new columns. When I run crawler and try to view on Athena, it throws this error: HIVE_INVALID_METADATA: Hive metadata for table cog_processor is invalid: Table descriptor contains duplicate columns This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: f564f213-18d9-46d9-a61c-7bc54af8bb5e I have checked that there are no duplicate columns on all the csv files. How can I fix this? Edit: I have made changes to other directories with csv files with same columns. Those are crawled correctly

John
asked 4 months ago255 views
1 Answer
1

Hi John, The error "HIVE_INVALID_METADATA: Hive metadata for table cog_processor is invalid: Table descriptor contains duplicate columns" indicates that the AWS Glue Crawler has detected duplicate column names in the metadata for the "cog_processor" table.

Even though your CSV files may not have duplicate column names, the issue could be caused by the way AWS Glue Crawler is interpreting the schema or due to a mismatch between the schema and the data in the CSV files. In order to resolve the issue, here are a few steps you can take:

  1. Verify the Schema:

    • Check the schema of the "cog_processor" table in the AWS Glue Data Catalog.
    • Ensure that there are no duplicate column names in the schema.
    • If there are duplicate column names, you can rename or remove the duplicate columns using the AWS Glue Console or by running an AWS Glue Crawler with the appropriate configuration.
  2. Update the Crawler Configuration:

    • Open the AWS Glue Crawler configuration that created the "cog_processor" table.
    • Check the "Crawler Source" section and ensure that the correct S3 prefix or path is specified.
    • In the "Crawler Output" section, select the option to "Update the table definition in the Data Catalog" or "Create a new table in the Data Catalog."
    • Optionally, you can configure the Crawler to "Add new columns only" or "Update all columns" based on your requirements.
    • Run the Crawler again to update the table metadata.
  3. Recreate the Table:

    • If the above steps do not resolve the issue, you can try deleting the "cog_processor" table from the AWS Glue Data Catalog.
    • Run the AWS Glue Crawler again to recreate the table with the correct schema.
  4. Check for Partitions:

    • If your table is partitioned, ensure that the partitioning columns are not causing any conflicts or duplicate column names.
    • You may need to adjust the partitioning configuration or remove partitioning if it's causing issues.
  5. Verify Data Consistency:

    • Double-check that the data in your CSV files is consistent with the schema detected by the AWS Glue Crawler.
    • If there are any inconsistencies or variations in the data across different CSV files, it may cause issues with the schema detection.

If the issue persists after trying these steps, you can reach out to AWS Support for further assistance and provide the Query ID mentioned in the error message.

AWS
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions