ICEBERG - Data Loss Issue during OPTIMIZE Command Execution in High Partition

0

We are utilizing AWS Glue Catalog along with Athena v3 and Iceberg for our data solution. We create tables using a CREATE TABLE query in Athena, setting the ‘table_type’ parameter to ‘ICEBERG’. Our data is partitioned on an hourly basis (I tested other partition types as well). However, we’re encountering a significant issue when our tables have a high number of partitions, more than 100 or other large quantities. When executing the OPTIMIZE command, we’re noticing that some of the data is unexpectedly removed from the table. This behavior is not anticipated and is causing data inconsistencies and loss. When I partition the data by BUCKET this problem does not occur (I think because it has a fixed number of partitions).

Example of the OPTIMIZE command:

OPTIMIZE test.test_iceberg REWRITE DATA USING BIN_PACK
WHERE date(created_at) = date '2023-03-08';

The latest table snapshot records, summary field:

{changed-partition-count=24, added-data-files=55, total-equality-deletes=0, added-records=1837854, trino_query_id=20230516_210052_00022_9g8in, total-position-deletes=0, added-files-size=1363665641, total-delete-files=0, total-files-size=7699737218, total-records=10619723, total-data-files=317}

{added-data-files=311, total-equality-deletes=0, added-records=8765775, trino_query_id=20230516_210135_00090_dixmr, deleted-data-files=266, deleted-records=8765775, total-records=10619723, removed-files-size=6391102826, changed-partition-count=100, total-position-deletes=0, added-files-size=6327068186, total-delete-files=0, total-files-size=7635702578, total-data-files=362}

{removed-files-size=6301213670, changed-partition-count=100, total-equality-deletes=0, trino_query_id=20230516_210205_00012_c64jn, deleted-data-files=314, total-position-deletes=0, total-delete-files=0, deleted-records=8731597, total-files-size=1334488908, total-records=1888126, total-data-files=48}

{added-data-files=2, total-equality-deletes=0, added-records=83350, trino_query_id=20230516_210213_00121_7xbbh, deleted-data-files=6, deleted-records=83350, total-records=1888126, removed-files-size=59119635, changed-partition-count=2, total-position-deletes=0, added-files-size=58846159, total-delete-files=0, total-files-size=1334215432, total-data-files=44}

Has anyone had this problem or can help me?

質問済み 1年前605ビュー
1回答
0

Hi,

If you have reproduced this many times ( and know the procedure to reproduce it), please open a support ticket as it will be the best thing to do.

Bests!

profile pictureAWS
回答済み 1年前
  • Then how will others in a similar predicament know when it's solved?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ