I have Athena Iceberg table.
The table has 2 partitions.
Each hour I update it with MERGE
and DELETE
commands.
SELECT count(*) FROM "my_table$files"
now gives 16. Meanwhile data folder contains 158 files.
None of
VACUUM my_table
or
OPTIMIZE my_table REWRITE DATA USING BIN_PACK
Is not helping to remove unnecessary files.
Table has following TBLPROPERTIES
TBLPROPERTIES (
'table_type'='iceberg',
'vacuum_max_snapshot_age_seconds'='60',
'write_compression'='ZSTD',
'format'='parquet',
'vacuum_max_metadata_files_to_keep'='2',
'optimize_rewrite_delete_file_threshold'='2',
'optimize_rewrite_data_file_threshold'='2'
)
It is that aggressive because I do not need any history of changes. I'm interested in the latest state only.
Number of files is keep growing and never decrease, the spite the fact that the number of rows in the table is almost constant.
What I'm doing wrong, and how to stop files inflation.
BTW When I manually delete from the
data
directory anything that is not listed in thefiles
query result I have following error on any random select.