Athena Iceberg creates 100,000 files where just a few dozen were expected

0

I have an iceberg table defined like this:

	CREATE TABLE IF NOT EXISTS staging (
	  id STRING,
	  staging_timestamp BIGINT,
              ... blah blah blah ...
	)
	PARTITIONED BY (bucket(24, id))
	LOCATION 's3://%s/%s/staging/'
	TBLPROPERTIES ( 
	  'table_type' ='ICEBERG', 
	  'optimize_rewrite_data_file_threshold' = '1',
	  'vacuum_max_snapshot_age_seconds' = '3600'
	);

I expected the number of files in S3 would stay around 24, especially after OPTIMIZE and VACUUM. However, after a few days I found 100,000 files on S3. VACUUM would time-out. OPTIMIZE didn't seem to remove any files.

What am I doing wrong?

AlexR
gefragt vor 2 Monaten171 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen