Migrating Postgres Databases to S3 Data Lake: Boolean Column Issue

0

We are in the process of migrating several Postgres databases from RDS to an S3 data lake. During this migration, we've encountered a peculiar issue with boolean columns in the Change Data Capture (CDC).

Issue Observed:

  • When using AWS DMS, all boolean columns in the CDC are consistently marked as true, regardless of their actual value in the source database.
  • Initially, DMS migrated all boolean values as varchar(5). To address this, we enabled the "MapBooleanAsBoolean" option on the source endpoint, expecting to see accurate boolean values in the data lake.
  • For one database, this adjustment worked flawlessly. However, for all other databases, every boolean value in the CDC remains as true.

Additional Observations:

  • Disabling the "MapBooleanAsBoolean" option led to the following results:
    • For the database where there wasn't an issue, boolean values appeared as "true" and "false" strings.
    • For databases with the boolean issue, values appeared as "t" and "f" strings.

We can manage these string conversions in our ETL processes or modify the parquet files, but the root cause of this boolean discrepancy remains elusive.

Troubleshooting Done:

  • We've checked configurations of our DMS task, replication instance, and both source and target endpoints; they are identical across all databases.
  • Documentation only states that booleans are converted to strings. Interestingly, it indicates that the "MapBooleanAsBoolean" option is only for Redshift targets. However, this option works (at least for one database) when our target is S3.

Key Question: Is there a specific setting or aspect within Postgres that might trigger this behavior?

Environment Details:

  • DMS replication instance: 3.4.7
  • All databases: Postgres 12.14

Any insights or guidance would be greatly appreciated!

yatmee
질문됨 7달 전132회 조회
답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인