Issues loading data from S3 into Athena

0

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://dt-altdata-datafeeds-prod-p-use2/tldp/v1/historical/transaction_brand_tagging/published_date=2024-06-02/part-00013-db8b1341-7e3b-470c-9495-96e7b8d5d3ad-c000.snappy.parquet (offset=33554432, length=33554432): class org.apache.parquet.io.GroupColumnIO cannot be cast to class org.apache.parquet.io.PrimitiveColumnIO (org.apache.parquet.io.GroupColumnIO and org.apache.parquet.io.PrimitiveColumnIO are in unnamed module of loader io.trino.server.PluginClassLoader @1cfbbc2e) This query ran against the "tldp2" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: fa875b84-b939-42c5-bae7-c632c6456a20

  • I'm trying to load data from S3 into Athena w/ Glue and it's giving me this error. Googling has returned me the following potential issues which I don't think are the solution: too many small files; mismatch date type for column.

    It is a bucket w/ 100 different 135mb snappy parquet files so I don't think it is option #1.

    I tried making everything a string (there were a couple dates, one int - 9 total columns) and that didn't help me out either.

    Thank you!!

MRyan
asked 4 months ago411 views
1 Answer
0

Hello,

I understand you are facing "HIVE_CANNOT_OPEN_SPLIT" while trying to query your parquet files in S3. I checked the error and as per Athena's documentation:

This error is caused by a parquet schema mismatch. A column that has a non-primitive type (for example, array) has been declared as a primitive type (for example, string) in AWS Glue. To troubleshoot this issue, check the data schema in the files and compare it with schema declared in AWS Glue.

You can use parquet-tools to analyze the schema of the parquet file and accordingly update the schema in Glue and try querying again.

AWS
Ankur_J
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions