ICEBERG - Data Loss Issue during OPTIMIZE Command Execution in High Partition

0

We are utilizing AWS Glue Catalog along with Athena v3 and Iceberg for our data solution. We create tables using a CREATE TABLE query in Athena, setting the ‘table_type’ parameter to ‘ICEBERG’. Our data is partitioned on an hourly basis (I tested other partition types as well). However, we’re encountering a significant issue when our tables have a high number of partitions, more than 100 or other large quantities. When executing the OPTIMIZE command, we’re noticing that some of the data is unexpectedly removed from the table. This behavior is not anticipated and is causing data inconsistencies and loss. When I partition the data by BUCKET this problem does not occur (I think because it has a fixed number of partitions).

Example of the OPTIMIZE command:

OPTIMIZE test.test_iceberg REWRITE DATA USING BIN_PACK
WHERE date(created_at) = date '2023-03-08';

The latest table snapshot records, summary field:

{changed-partition-count=24, added-data-files=55, total-equality-deletes=0, added-records=1837854, trino_query_id=20230516_210052_00022_9g8in, total-position-deletes=0, added-files-size=1363665641, total-delete-files=0, total-files-size=7699737218, total-records=10619723, total-data-files=317}

{added-data-files=311, total-equality-deletes=0, added-records=8765775, trino_query_id=20230516_210135_00090_dixmr, deleted-data-files=266, deleted-records=8765775, total-records=10619723, removed-files-size=6391102826, changed-partition-count=100, total-position-deletes=0, added-files-size=6327068186, total-delete-files=0, total-files-size=7635702578, total-data-files=362}

{removed-files-size=6301213670, changed-partition-count=100, total-equality-deletes=0, trino_query_id=20230516_210205_00012_c64jn, deleted-data-files=314, total-position-deletes=0, total-delete-files=0, deleted-records=8731597, total-files-size=1334488908, total-records=1888126, total-data-files=48}

{added-data-files=2, total-equality-deletes=0, added-records=83350, trino_query_id=20230516_210213_00121_7xbbh, deleted-data-files=6, deleted-records=83350, total-records=1888126, removed-files-size=59119635, changed-partition-count=2, total-position-deletes=0, added-files-size=58846159, total-delete-files=0, total-files-size=1334215432, total-data-files=44}

Has anyone had this problem or can help me?

질문됨 일 년 전605회 조회
1개 답변
0

Hi,

If you have reproduced this many times ( and know the procedure to reproduce it), please open a support ticket as it will be the best thing to do.

Bests!

profile pictureAWS
답변함 일 년 전
  • Then how will others in a similar predicament know when it's solved?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠