ICEBERG - Data Loss Issue during OPTIMIZE Command Execution in High Partition

0

We are utilizing AWS Glue Catalog along with Athena v3 and Iceberg for our data solution. We create tables using a CREATE TABLE query in Athena, setting the ‘table_type’ parameter to ‘ICEBERG’. Our data is partitioned on an hourly basis (I tested other partition types as well). However, we’re encountering a significant issue when our tables have a high number of partitions, more than 100 or other large quantities. When executing the OPTIMIZE command, we’re noticing that some of the data is unexpectedly removed from the table. This behavior is not anticipated and is causing data inconsistencies and loss. When I partition the data by BUCKET this problem does not occur (I think because it has a fixed number of partitions).

Example of the OPTIMIZE command:

OPTIMIZE test.test_iceberg REWRITE DATA USING BIN_PACK
WHERE date(created_at) = date '2023-03-08';

The latest table snapshot records, summary field:

{changed-partition-count=24, added-data-files=55, total-equality-deletes=0, added-records=1837854, trino_query_id=20230516_210052_00022_9g8in, total-position-deletes=0, added-files-size=1363665641, total-delete-files=0, total-files-size=7699737218, total-records=10619723, total-data-files=317}

{added-data-files=311, total-equality-deletes=0, added-records=8765775, trino_query_id=20230516_210135_00090_dixmr, deleted-data-files=266, deleted-records=8765775, total-records=10619723, removed-files-size=6391102826, changed-partition-count=100, total-position-deletes=0, added-files-size=6327068186, total-delete-files=0, total-files-size=7635702578, total-data-files=362}

{removed-files-size=6301213670, changed-partition-count=100, total-equality-deletes=0, trino_query_id=20230516_210205_00012_c64jn, deleted-data-files=314, total-position-deletes=0, total-delete-files=0, deleted-records=8731597, total-files-size=1334488908, total-records=1888126, total-data-files=48}

{added-data-files=2, total-equality-deletes=0, added-records=83350, trino_query_id=20230516_210213_00121_7xbbh, deleted-data-files=6, deleted-records=83350, total-records=1888126, removed-files-size=59119635, changed-partition-count=2, total-position-deletes=0, added-files-size=58846159, total-delete-files=0, total-files-size=1334215432, total-data-files=44}

Has anyone had this problem or can help me?

posta un anno fa605 visualizzazioni
1 Risposta
0

Hi,

If you have reproduced this many times ( and know the procedure to reproduce it), please open a support ticket as it will be the best thing to do.

Bests!

profile pictureAWS
con risposta un anno fa
  • Then how will others in a similar predicament know when it's solved?

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande