We ran into the serialization error of the timestamp field after switching to Athena version 3 for the parquet data files.
Using Athena version 2, the same parquet data is read without errors. Also, before opening this issue, we ensured that our data corresponded to all the described recommendations from the Athena 3 guide.
Error: SERIALIZATION_ERROR: Could not serialize column 'timestamp' of type 'timestamp(3)' at position 1:1
Below is the suggested solution to the problem from the documentation:
Precision mismatch in Timestamp columns causes serialization error
Error message: SERIALIZATION_ERROR: Could not serialize column 'COLUMNZ' of type 'timestamp(3)' at position X:Y
Cause: Athena engine version 3 checks to make sure that the precision of timestamps in the data is the same as the precision specified for the column data type in the table specification. Currently, this precision is always 3. If the data has a precision greater than this, queries fail with the error noted.
Suggested solution: Check your data to make sure that your timestamps have millisecond precision.
In our parquet data, we use a timestamp field with Unix time format, which has millisecond precision:
"timestamp": 1688479202968
Example of our parquet data and DDL statement for creating Athena table:
{
"id": "23020733",
"timestamp": 1688479202968,
"receiveTimestamp": 1688479203118
...
}
CREATE EXTERNAL TABLE `sample_table`(
`id` string,
`timestamp` timestamp,
`receivetimestamp` timestamp
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://...'
It is critical for us to ensure the normal functioning of the Athena query without changing existing requests or the parquet files.
At this stage, we have switched Athena to version 2 for all our services and are still working with it.
What is the correct description of this field in the Athena table? If it's a bug on WAS side, how could I report it to them?
Thanks Abhishek! The thing is, their release note says that they fixed it and Athena 3 is in parity with Athena 2, but it still doesn't work for us. We also tried to test the parquet file with different encoding, and it works:
But the problem cannot be in the parquet files, as they are processed by Athena 2 without issues. I already opened a support case in AWS portal, hopefully, they will answer, but as we are not entitled to technical support, I don't know if this is going to be solved.