ICEBERG - Data Loss Issue during OPTIMIZE Command Execution in High Partition

0

We are utilizing AWS Glue Catalog along with Athena v3 and Iceberg for our data solution. We create tables using a CREATE TABLE query in Athena, setting the ‘table_type’ parameter to ‘ICEBERG’. Our data is partitioned on an hourly basis (I tested other partition types as well). However, we’re encountering a significant issue when our tables have a high number of partitions, more than 100 or other large quantities. When executing the OPTIMIZE command, we’re noticing that some of the data is unexpectedly removed from the table. This behavior is not anticipated and is causing data inconsistencies and loss. When I partition the data by BUCKET this problem does not occur (I think because it has a fixed number of partitions).

Example of the OPTIMIZE command:

OPTIMIZE test.test_iceberg REWRITE DATA USING BIN_PACK
WHERE date(created_at) = date '2023-03-08';

The latest table snapshot records, summary field:

{changed-partition-count=24, added-data-files=55, total-equality-deletes=0, added-records=1837854, trino_query_id=20230516_210052_00022_9g8in, total-position-deletes=0, added-files-size=1363665641, total-delete-files=0, total-files-size=7699737218, total-records=10619723, total-data-files=317}

{added-data-files=311, total-equality-deletes=0, added-records=8765775, trino_query_id=20230516_210135_00090_dixmr, deleted-data-files=266, deleted-records=8765775, total-records=10619723, removed-files-size=6391102826, changed-partition-count=100, total-position-deletes=0, added-files-size=6327068186, total-delete-files=0, total-files-size=7635702578, total-data-files=362}

{removed-files-size=6301213670, changed-partition-count=100, total-equality-deletes=0, trino_query_id=20230516_210205_00012_c64jn, deleted-data-files=314, total-position-deletes=0, total-delete-files=0, deleted-records=8731597, total-files-size=1334488908, total-records=1888126, total-data-files=48}

{added-data-files=2, total-equality-deletes=0, added-records=83350, trino_query_id=20230516_210213_00121_7xbbh, deleted-data-files=6, deleted-records=83350, total-records=1888126, removed-files-size=59119635, changed-partition-count=2, total-position-deletes=0, added-files-size=58846159, total-delete-files=0, total-files-size=1334215432, total-data-files=44}

Has anyone had this problem or can help me?

已提问 1 年前605 查看次数
1 回答
0

Hi,

If you have reproduced this many times ( and know the procedure to reproduce it), please open a support ticket as it will be the best thing to do.

Bests!

profile pictureAWS
已回答 1 年前
  • Then how will others in a similar predicament know when it's solved?

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则