- Newest
- Most votes
- Most comments
Hello,
This seems to a compatibility issue in spark version. Please check if it works in Spark 2 version. Please check the execution log for more insight about this issue. If it is due to changes in Spark 3.1 (glue 3.0+) versions particularly when handling timestamps in parquet files, then you have to add the below configs to correct the timestamp handler in higher version in Job parameters under advanced properties,
key: --conf
value: spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED --conf spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED
References:
We recommend to set the following parameters when reading/writing parquet data that contains timestamp columns. Setting those parameters can resolve the calendar incompatibility issue that occurs during the Spark 2 to Spark 3 upgrade, for both the AWS Glue Dynamic Frame and Spark Data Frame. Use the CORRECTED option to read the datetime value as it is; and the LEGACY option to rebase the datetime values with regard to the calendar difference during reading.
More details here
Relevant content
- asked a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 months ago