DMS parquet with time column getting error "hour must be in 0..23" when reading with pandas/pyarrow/fastparquet/polars/duckdb

0

I have a table on Postgres version 12.6 and use the DMS version 3.5.2 to extract the data to parquet files on s3.

Everything is working well except for the problem of reading TIME (data type) columns:

When I read with pandas/pyarrow/polars/duckdb I get the error "hour must be in 0..23" and with fastparquet I get an incorrect time "18:53:10.144000" when the correct time is "17:00:00".

I tried to change the reading schema to int32 instead of time, and I didn't get any error, but the integer value is "-1400809856".

DMS s3 target endpoint:

{
    "DataFormat": "parquet",
    "ParquetVersion": "parquet-2-0",
    "EnableStatistics": true,
    "ParquetTimestampInMillisecond": true
}

I tried with different options (parquet-2-0, parquet-1-0, compression NONE and GZIP, ParquetTimestampInMillisecond True and False) but nothing worked.

The table definition:

create table mytable (myid int, mytime time)

The table data:

myid|mytime|
1|17:00:00|
2|17:00:00|

Python code:

import pyarrow as pa
import pyarrow.parquet as pq
filename = "LOAD00000001.parquet"
df = pq.read_table(filename)
print(df["mytime"][0])
print(df["mytime"][0].value)

Python code output:

18:53:10.144000
-1400809856

So, my question is, how do you read time columns from the parquet generated from DMS?

질문됨 5달 전106회 조회
답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠