I want to resolve the error I get when I try to use an Apache Iceberg table with AWS Glue.
Short description
You might get the following error message when you use an Apache Iceberg table with AWS Glue:
"Caused by: org.apache.iceberg.exceptions.CommitStateUnknownException: Number of TABLE_VERSION resources exceeds the account limit."
This error occurs because Glue creates a new table version when you change the table schema or any other table property.
When an API call updates a value in the Iceberg table, the API changes the value of the metadata_location parameter in the backend. The new value creates a new version of the table. Multiple existing versions of the table can generate an error. To resolve this error, delete earlier versions of the table.
Resolution
To clear earlier versions of the table, complete the following steps:
-
Use the GetTableVersions API to identify the earlier table versions for a given table.
-
Run the DeleteTableVersion or the BatchDeleteTableVersion API to delete the identified table versions.
-or-
Use the example script that appears below to delete earlier versions of all tables in a specific database.
Note: Include the CatalogId and DatabaseName parameters to identify the name of the catalog and database that has the table versions.
import boto3
client = boto3.client('glue')
#Setting the required variables
CatalogId=your_account_number
DatabaseName=your_database_name
#Get the list of tables in a given database
table_database=client.get_tables(CatalogId=CatalogId,DatabaseName=DatabaseName)
t_list=[table_database['TableList'][i]['Name'] for i in range(0,len(table_database['TableList']))]
for TableName in t_list:
print("Deletion started for " + TableName)
#Get the list of all available versions for a given table
paginator = client.get_table_versions(CatalogId=CatalogId,DatabaseName=DatabaseName,TableName= TableName)
v_list=[ paginator['TableVersions'][i]['VersionId'] for i in range(0,len(paginator['TableVersions']))]
print(v_list)
#Delete all older versions
delete_all_old_versions = client.batch_delete_table_version(CatalogId=CatalogId,DatabaseName=DatabaseName,TableName= TableName,VersionIds=v_list[1:])
#Validate the list of remaining list
paginator = client.get_table_versions(CatalogId=CatalogId,DatabaseName=DatabaseName,TableName= TableName)
v_list=[ paginator['TableVersions'][i]['VersionId'] for i in range(0,len(paginator['TableVersions']))]
print(v_list)
print("Deletion ended for " + TableName)
Note: Replace your_account_number with your AWS account number. Replace your_database_name with the name of your database.