Athena version 3 - Timestamp column causes serialization error for parquet data format

1

We ran into the serialization error of the timestamp field after switching to Athena version 3 for the parquet data files. Using Athena version 2, the same parquet data is read without errors. Also, before opening this issue, we ensured that our data corresponded to all the described recommendations from the Athena 3 guide.

Error: SERIALIZATION_ERROR: Could not serialize column 'timestamp' of type 'timestamp(3)' at position 1:1

Below is the suggested solution to the problem from the documentation:

Precision mismatch in Timestamp columns causes serialization error Error message: SERIALIZATION_ERROR: Could not serialize column 'COLUMNZ' of type 'timestamp(3)' at position X:Y

Cause: Athena engine version 3 checks to make sure that the precision of timestamps in the data is the same as the precision specified for the column data type in the table specification. Currently, this precision is always 3. If the data has a precision greater than this, queries fail with the error noted.

Suggested solution: Check your data to make sure that your timestamps have millisecond precision.

In our parquet data, we use a timestamp field with Unix time format, which has millisecond precision: "timestamp": 1688479202968

Example of our parquet data and DDL statement for creating Athena table:

{
  "id": "23020733",
  "timestamp": 1688479202968,
  "receiveTimestamp": 1688479203118
   ...
}
CREATE EXTERNAL TABLE `sample_table`(
  `id` string, 
  `timestamp` timestamp, 
  `receivetimestamp` timestamp 
   )
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://...'

It is critical for us to ensure the normal functioning of the Athena query without changing existing requests or the parquet files.

At this stage, we have switched Athena to version 2 for all our services and are still working with it.

What is the correct description of this field in the Athena table? If it's a bug on WAS side, how could I report it to them?

asked 9 months ago847 views
1 Answer
0

This seems to be the expected behavior perhaps. Please refer Athen Version History and search for "Precision", you'll see all the changes related to timestamp precision and will give you an idea about the difference in behavior between version-2 and version-3.

However if you still think, this is a bug then first point of contact would be AWS Support and I'd suggest you to log a case under "Technical category" with AWS support describing the situation and they will certainly help you and provide additional context and guidance. Note that, you'll only be able to log a support case if you have a support plan which can let you create the case, basic support plan doesn't come with that ability.

Hope this helps.

Comment here if you have additional questions, happy to help.

Abhishek

profile pictureAWS
EXPERT
answered 9 months ago
  • Thanks Abhishek! The thing is, their release note says that they fixed it and Athena 3 is in parity with Athena 2, but it still doesn't work for us. We also tried to test the parquet file with different encoding, and it works:

    -- correct
    "encodings" : [ "RLE", "PLAIN", "BIT_PACKED" ],
    
    -- incorrect
    "encodings" : [ "RLE", "PLAIN" ], 
    

    But the problem cannot be in the parquet files, as they are processed by Athena 2 without issues. I already opened a support case in AWS portal, hopefully, they will answer, but as we are not entitled to technical support, I don't know if this is going to be solved.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions