DMS parquet with time column getting error "hour must be in 0..23" when reading with pandas/pyarrow/fastparquet/polars/duckdb

0

I have a table on Postgres version 12.6 and use the DMS version 3.5.2 to extract the data to parquet files on s3.

Everything is working well except for the problem of reading TIME (data type) columns:

When I read with pandas/pyarrow/polars/duckdb I get the error "hour must be in 0..23" and with fastparquet I get an incorrect time "18:53:10.144000" when the correct time is "17:00:00".

I tried to change the reading schema to int32 instead of time, and I didn't get any error, but the integer value is "-1400809856".

DMS s3 target endpoint:

{
    "DataFormat": "parquet",
    "ParquetVersion": "parquet-2-0",
    "EnableStatistics": true,
    "ParquetTimestampInMillisecond": true
}

I tried with different options (parquet-2-0, parquet-1-0, compression NONE and GZIP, ParquetTimestampInMillisecond True and False) but nothing worked.

The table definition:

create table mytable (myid int, mytime time)

The table data:

myid|mytime|
1|17:00:00|
2|17:00:00|

Python code:

import pyarrow as pa
import pyarrow.parquet as pq
filename = "LOAD00000001.parquet"
df = pq.read_table(filename)
print(df["mytime"][0])
print(df["mytime"][0].value)

Python code output:

18:53:10.144000
-1400809856

So, my question is, how do you read time columns from the parquet generated from DMS?

gefragt vor 5 Monaten106 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen