Athena query failing

0

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://vsgdev/amahal797/year_p=2022/month_p=11/day_p=28/file.parquet (offset=603979776, length=33554432): Multiple entries with same key: [rpdname] optional binary =min: , max: d11_walledgarden_v6.cm, num_nulls not defined and [rpdname] optional binary =min: , max: SNTCDSFW03, num_nulls not defined

The Database table was populated using S3 parquet file.

any idea what this error means?

已提問 1 年前檢視次數 340 次
1 個回答
0

The error that you are facing occurs mostly due to the presence of duplicates in the file that the Athena was trying to fetch. Upon looking at the error message that you shared, I can say that the file.parquet present in your s3 location consists of duplicate parameter 'rpdname' which might be causing the issue.

Therefore, to mitigate this issue, I suggest you to check the contents of file.parquet for the duplicates and then remove/modify one of the 'rpdname' fields and then rerun the query. Basically, I think that the issue is with the underlying data but not the query. Thus check and clean the data before you run the query.

profile pictureAWS
支援工程師
Chaitu
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南