How do I manage the number of table versions in AWS Glue for Apache Iceberg tables?

2 minute read
0

I want to resolve the error I get when I try to use an Apache Iceberg table with AWS Glue.

Short description

You might get the following error message when you use an Apache Iceberg table with AWS Glue:

"Caused by: org.apache.iceberg.exceptions.CommitStateUnknownException: Number of TABLE_VERSION resources exceeds the account limit."

This error occurs because Glue creates a new table version when you change the table schema or any other table property.

When an API call updates a value in the Iceberg table, the API changes the value of the metadata_location parameter in the backend. The new value creates a new version of the table. Multiple existing versions of the table can generate an error. To resolve this error, delete earlier versions of the table.

Resolution

To clear earlier versions of the table, complete the following steps:

  1. Use the GetTableVersions API to identify the earlier table versions for a given table.

  2. Run the DeleteTableVersion or the BatchDeleteTableVersion API to delete the identified table versions.
    -or-
    Use the example script that appears below to delete earlier versions of all tables in a specific database.
    Note: Include the CatalogId and DatabaseName parameters to identify the name of the catalog and database that has the table versions.

    import boto3
    client = boto3.client('glue')
    
    
    #Setting the required variables
    
    
    CatalogId=your_account_number
    DatabaseName=your_database_name
    
    
    #Get the list of tables in a given database 
    
    
    table_database=client.get_tables(CatalogId=CatalogId,DatabaseName=DatabaseName)
    t_list=[table_database['TableList'][i]['Name'] for i in range(0,len(table_database['TableList']))]
    
    
    for TableName in t_list:
    
    print("Deletion started for " + TableName) 
    
    #Get the list of all available versions for a given table
    
    paginator = client.get_table_versions(CatalogId=CatalogId,DatabaseName=DatabaseName,TableName= TableName)
    v_list=[ paginator['TableVersions'][i]['VersionId'] for i in range(0,len(paginator['TableVersions']))]
    print(v_list)
    
    #Delete all older versions
    
    delete_all_old_versions = client.batch_delete_table_version(CatalogId=CatalogId,DatabaseName=DatabaseName,TableName= TableName,VersionIds=v_list[1:])
    
    #Validate the list of remaining list
    
    paginator = client.get_table_versions(CatalogId=CatalogId,DatabaseName=DatabaseName,TableName= TableName)
    v_list=[ paginator['TableVersions'][i]['VersionId'] for i in range(0,len(paginator['TableVersions']))]
    
    print(v_list)
    
    print("Deletion ended for " + TableName)

    Note: Replace your_account_number with your AWS account number. Replace your_database_name with the name of your database.

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago