Athena Iceberg does not delete orphan files

0

I have Athena Iceberg table. The table has 2 partitions.

Each hour I update it with MERGE and DELETE commands.

SELECT count(*) FROM "my_table$files"

now gives 16. Meanwhile data folder contains 158 files.

None of

VACUUM  my_table

or

OPTIMIZE my_table REWRITE DATA USING BIN_PACK

Is not helping to remove unnecessary files.

Table has following TBLPROPERTIES

TBLPROPERTIES (
  'table_type'='iceberg',
  'vacuum_max_snapshot_age_seconds'='60',
  'write_compression'='ZSTD',
  'format'='parquet',
  'vacuum_max_metadata_files_to_keep'='2',
  'optimize_rewrite_delete_file_threshold'='2',
  'optimize_rewrite_data_file_threshold'='2'
)

It is that aggressive because I do not need any history of changes. I'm interested in the latest state only.

Number of files is keep growing and never decrease, the spite the fact that the number of rows in the table is almost constant.

What I'm doing wrong, and how to stop files inflation.

  • BTW When I manually delete from the data directory anything that is not listed in the files query result I have following error on any random select.

    ICEBERG_CANNOT_OPEN_SPLIT: Error opening Iceberg split s3://my_bucket/data_lake/my_table/data/lOlxRw/20240801_100037_00009_4atvz-e01ed1f7-ec42-4841-a13a-461c597951f4.parquet (offset=0, length=16462): io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist.
    
profile picture
Smotrov
asked 2 months ago436 views
1 Answer
-1

Thanks for sharing

profile picture
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions