ICEBERG - Data Loss Issue during OPTIMIZE Command Execution in High Partition

0

We are utilizing AWS Glue Catalog along with Athena v3 and Iceberg for our data solution. We create tables using a CREATE TABLE query in Athena, setting the ‘table_type’ parameter to ‘ICEBERG’. Our data is partitioned on an hourly basis (I tested other partition types as well). However, we’re encountering a significant issue when our tables have a high number of partitions, more than 100 or other large quantities. When executing the OPTIMIZE command, we’re noticing that some of the data is unexpectedly removed from the table. This behavior is not anticipated and is causing data inconsistencies and loss. When I partition the data by BUCKET this problem does not occur (I think because it has a fixed number of partitions).

Example of the OPTIMIZE command:

OPTIMIZE test.test_iceberg REWRITE DATA USING BIN_PACK
WHERE date(created_at) = date '2023-03-08';

The latest table snapshot records, summary field:

{changed-partition-count=24, added-data-files=55, total-equality-deletes=0, added-records=1837854, trino_query_id=20230516_210052_00022_9g8in, total-position-deletes=0, added-files-size=1363665641, total-delete-files=0, total-files-size=7699737218, total-records=10619723, total-data-files=317}

{added-data-files=311, total-equality-deletes=0, added-records=8765775, trino_query_id=20230516_210135_00090_dixmr, deleted-data-files=266, deleted-records=8765775, total-records=10619723, removed-files-size=6391102826, changed-partition-count=100, total-position-deletes=0, added-files-size=6327068186, total-delete-files=0, total-files-size=7635702578, total-data-files=362}

{removed-files-size=6301213670, changed-partition-count=100, total-equality-deletes=0, trino_query_id=20230516_210205_00012_c64jn, deleted-data-files=314, total-position-deletes=0, total-delete-files=0, deleted-records=8731597, total-files-size=1334488908, total-records=1888126, total-data-files=48}

{added-data-files=2, total-equality-deletes=0, added-records=83350, trino_query_id=20230516_210213_00121_7xbbh, deleted-data-files=6, deleted-records=83350, total-records=1888126, removed-files-size=59119635, changed-partition-count=2, total-position-deletes=0, added-files-size=58846159, total-delete-files=0, total-files-size=1334215432, total-data-files=44}

Has anyone had this problem or can help me?

已提問 1 年前檢視次數 605 次
1 個回答
0

Hi,

If you have reproduced this many times ( and know the procedure to reproduce it), please open a support ticket as it will be the best thing to do.

Bests!

profile pictureAWS
已回答 1 年前
  • Then how will others in a similar predicament know when it's solved?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南