DMS parquet with time column getting error "hour must be in 0..23" when reading with pandas/pyarrow/fastparquet/polars/duckdb

0

I have a table on Postgres version 12.6 and use the DMS version 3.5.2 to extract the data to parquet files on s3.

Everything is working well except for the problem of reading TIME (data type) columns:

When I read with pandas/pyarrow/polars/duckdb I get the error "hour must be in 0..23" and with fastparquet I get an incorrect time "18:53:10.144000" when the correct time is "17:00:00".

I tried to change the reading schema to int32 instead of time, and I didn't get any error, but the integer value is "-1400809856".

DMS s3 target endpoint:

{
    "DataFormat": "parquet",
    "ParquetVersion": "parquet-2-0",
    "EnableStatistics": true,
    "ParquetTimestampInMillisecond": true
}

I tried with different options (parquet-2-0, parquet-1-0, compression NONE and GZIP, ParquetTimestampInMillisecond True and False) but nothing worked.

The table definition:

create table mytable (myid int, mytime time)

The table data:

myid|mytime|
1|17:00:00|
2|17:00:00|

Python code:

import pyarrow as pa
import pyarrow.parquet as pq
filename = "LOAD00000001.parquet"
df = pq.read_table(filename)
print(df["mytime"][0])
print(df["mytime"][0].value)

Python code output:

18:53:10.144000
-1400809856

So, my question is, how do you read time columns from the parquet generated from DMS?

質問済み 5ヶ月前106ビュー
回答なし

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ