ETL job failing with weird error

2

My etl job failing with below by checking the log, not sure what causing. Highly appreciate any advice

Language: python 3 Glue : 3

An error occurred while calling o93.parquet. java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary

Mark
已提问 7 个月前229 查看次数
1 回答
3
已接受的回答

Hello,

Seems like you are getting UnsupportedOperationException when reading the parquet data. There might be two cases as far as I aware. Either the underlying parquet file/files might be corrupted or the schema/datatype reference interpreted incorrectly. If you have partitioned data in s3 data source, try reading different data and see if you are getting the same issue when specifically reading particular partitioned data. If the files are not corrupted on the other hand, check if any column is in different type for an example, it may also throw this kind of exception. Refer this Jira - https://issues.apache.org/jira/browse/SPARK-24828

AWS
支持工程师
已回答 7 个月前
  • Thank you!!. I got the issue when querying particular partition, not sure though but I recreated that partition and the issue is resolved.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则