Athena query failing

0

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://vsgdev/amahal797/year_p=2022/month_p=11/day_p=28/file.parquet (offset=603979776, length=33554432): Multiple entries with same key: [rpdname] optional binary =min: , max: d11_walledgarden_v6.cm, num_nulls not defined and [rpdname] optional binary =min: , max: SNTCDSFW03, num_nulls not defined

The Database table was populated using S3 parquet file.

any idea what this error means?

demandé il y a un an340 vues
1 réponse
0

The error that you are facing occurs mostly due to the presence of duplicates in the file that the Athena was trying to fetch. Upon looking at the error message that you shared, I can say that the file.parquet present in your s3 location consists of duplicate parameter 'rpdname' which might be causing the issue.

Therefore, to mitigate this issue, I suggest you to check the contents of file.parquet for the duplicates and then remove/modify one of the 'rpdname' fields and then rerun the query. Basically, I think that the issue is with the underlying data but not the query. Thus check and clean the data before you run the query.

profile pictureAWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
Chaitu
répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions