1 Answer
- Newest
- Most votes
- Most comments
0
Hello,
I understand you are facing "HIVE_CANNOT_OPEN_SPLIT" while trying to query your parquet files in S3. I checked the error and as per Athena's documentation:
This error is caused by a parquet schema mismatch. A column that has a non-primitive type (for example, array) has been declared as a primitive type (for example, string) in AWS Glue. To troubleshoot this issue, check the data schema in the files and compare it with schema declared in AWS Glue.
You can use parquet-tools to analyze the schema of the parquet file and accordingly update the schema in Glue and try querying again.
answered 4 months ago
Relevant content
- asked 2 years ago
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 8 months ago
I'm trying to load data from S3 into Athena w/ Glue and it's giving me this error. Googling has returned me the following potential issues which I don't think are the solution: too many small files; mismatch date type for column.
It is a bucket w/ 100 different 135mb snappy parquet files so I don't think it is option #1.
I tried making everything a string (there were a couple dates, one int - 9 total columns) and that didn't help me out either.
Thank you!!