Athena Iceberg table Commit error with Lambda Service

0

I'm trying to update iceberg table with Athena Client in AWS lambda but getting COMMIT error. My query runs about 100 queries at a time, and the time interval for generating a query is within 1 to 2 seconds.

following error: ICEBERG_COMMIT_ERROR: Failed to commit Iceberg update to the table: . If a data manifest file was generated at 's3://bucket_name/path/manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account.

Any idea what is the issue?

asked 23 days ago148 views
1 Answer
1
Accepted Answer

Hello.

I haven't seen the code and query you are using so I don't know the details, but if I delete 's3://bucket_name/path/manifest.csv' as the error message says, will the query run?

If you can share the query and code you are running, could you please do so?

profile picture
EXPERT
answered 23 days ago
profile picture
EXPERT
reviewed 22 days ago
profile picture
EXPERT
reviewed 23 days ago
  • Hello.

    When I checked the "manifest.csv" file after this error, the file did not exist. I also checked that it worked well when I re-run the query.

    It is difficult to share all the codes, but the code in the query part is as follows. code:

    session = boto3.Session()
    athena_client = session.client('athena')
    response = athena_client.start_query_execution(
            QueryString=query_string,
            QueryExecutionContext={
                'Database': database_name
            },
            ResultConfiguration={
                'OutputLocation': s3://bucket_name/path/'
            },
            WorkGroup='group'
        )
    
    if status == 'SUCCEEDED':
            print(f"[{query_execution_id}] Query SUCCEEDED!")
            results = athena_client.get_query_results(QueryExecutionId=query_execution_id)
            return results['ResultSet']['Rows']
    else:
            print(f"[{query_execution_id}] Query failed!")
            return None
    

    I think parallel execution is a problem. Is there any way to fix it?

  • Thank you for sharing the code. I'm not sure if it will lead to a direct solution, but how about fixing the number of concurrent executions of Lambda to 1 as described in the GitHub issue below? https://github.com/aws/aws-sdk-pandas/issues/2651#issuecomment-1955081562
    Also, can I run the query with a smaller number of parallel executions, such as 10 instead of 100?

  • Thank you for your reply. Let's apply the concurrency limit to the issue.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions