DMS parquet with time column getting error "hour must be in 0..23" when reading with pandas/pyarrow/fastparquet/polars/duckdb

0

I have a table on Postgres version 12.6 and use the DMS version 3.5.2 to extract the data to parquet files on s3.

Everything is working well except for the problem of reading TIME (data type) columns:

When I read with pandas/pyarrow/polars/duckdb I get the error "hour must be in 0..23" and with fastparquet I get an incorrect time "18:53:10.144000" when the correct time is "17:00:00".

I tried to change the reading schema to int32 instead of time, and I didn't get any error, but the integer value is "-1400809856".

DMS s3 target endpoint:

{
    "DataFormat": "parquet",
    "ParquetVersion": "parquet-2-0",
    "EnableStatistics": true,
    "ParquetTimestampInMillisecond": true
}

I tried with different options (parquet-2-0, parquet-1-0, compression NONE and GZIP, ParquetTimestampInMillisecond True and False) but nothing worked.

The table definition:

create table mytable (myid int, mytime time)

The table data:

myid|mytime|
1|17:00:00|
2|17:00:00|

Python code:

import pyarrow as pa
import pyarrow.parquet as pq
filename = "LOAD00000001.parquet"
df = pq.read_table(filename)
print(df["mytime"][0])
print(df["mytime"][0].value)

Python code output:

18:53:10.144000
-1400809856

So, my question is, how do you read time columns from the parquet generated from DMS?

asked 4 months ago104 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions