Using Athena on an s3 bucket that's been crawled and get the error:
class org.apache.parquet.io.GroupColumnIO cannot be cast to class org.apache.parquet.io.PrimitiveColumnIO
I've narrowed down the column that's causing it, but it's classified as array<Int> in parquet and in the glue database table.
I'm streaming Auth0 logs to an event bus, turning that into a parquet file and saving it in S3. I then crawl the S3 bucket and use Athena to analyse the data.
Query Id: b010574d-f708-4855-abde-66dbd90cb7d7
python script for turning event into parquet:
s3_path = "s3://{}/{}/{}.parquet".format(s3_bucket_name, event_actual_date, event_id) df = pd.DataFrame(pd.json_normalize(event, sep="_")) awswrangler.s3.to_parquet(df, dataset=False, path=s3_path, index=False)