Athena query failing

0

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://vsgdev/amahal797/year_p=2022/month_p=11/day_p=28/file.parquet (offset=603979776, length=33554432): Multiple entries with same key: [rpdname] optional binary =min: , max: d11_walledgarden_v6.cm, num_nulls not defined and [rpdname] optional binary =min: , max: SNTCDSFW03, num_nulls not defined

The Database table was populated using S3 parquet file.

any idea what this error means?

asked a year ago326 views
1 Answer
0

The error that you are facing occurs mostly due to the presence of duplicates in the file that the Athena was trying to fetch. Upon looking at the error message that you shared, I can say that the file.parquet present in your s3 location consists of duplicate parameter 'rpdname' which might be causing the issue.

Therefore, to mitigate this issue, I suggest you to check the contents of file.parquet for the duplicates and then remove/modify one of the 'rpdname' fields and then rerun the query. Basically, I think that the issue is with the underlying data but not the query. Thus check and clean the data before you run the query.

profile pictureAWS
SUPPORT ENGINEER
Chaitu
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions