class org.apache.parquet.io.GroupColumnIO cannot be cast to class org.apache.parquet.io.PrimitiveColumnIO

0

Using Athena on an s3 bucket that's been crawled and get the error: class org.apache.parquet.io.GroupColumnIO cannot be cast to class org.apache.parquet.io.PrimitiveColumnIO I've narrowed down the column that's causing it, but it's classified as array<Int> in parquet and in the glue database table.

I'm streaming Auth0 logs to an event bus, turning that into a parquet file and saving it in S3. I then crawl the S3 bucket and use Athena to analyse the data. Query Id: b010574d-f708-4855-abde-66dbd90cb7d7

python script for turning event into parquet:

s3_path = "s3://{}/{}/{}.parquet".format(s3_bucket_name, event_actual_date, event_id) df = pd.DataFrame(pd.json_normalize(event, sep="_")) awswrangler.s3.to_parquet(df, dataset=False, path=s3_path, index=False)

Marc
asked 9 months ago75 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions