Apache Iceberg version
None
Query engine
Athena (engine v3)
Hello everyone, today my team incurred in a very strange bug using Iceberg via Athena. I'll descrive the steps we used to reproduce the error below:
- We create an iceberg table with an "id" column and 321 other columns with random strings - in the example below we use awsrangler to create the table, but the same happens when the table is created using Athena directly.
import awswrangler as wr
import pandas as pd
import random, string
EXAMPLE CODE:
NUM_COLS=322
def get_random_string(length):
letters = string.ascii_lowercase
result_str = ''.join(random.choice(letters) for i in range(length))
return result_str
columns = ['id']+[get_random_string(5) for i in range(NUM_COLS-1) ]
data = pd.DataFrame(data=[columns], columns=columns)
wr.athena.to_iceberg(
data,
workgroup="my-workgroup",
database="my_database",
table="iceberg_limits_322",
table_location="s3://my_bucket/iceberg_limits",
)
- we then run the following query in athena to insert a random value
EXAMPLE QUERY:
MERGE INTO my_database.iceberg_limits_322 as existing
using (
SELECT 'something' as id
) as new on existing.id = new.id
WHEN NOT MATCHED
THEN INSERT (id) VALUES (new.id)
WHEN MATCHED THEN DELETE
- which results in the error:
[ErrorCode: INTERNAL_ERROR_QUERY_ENGINE] Amazon Athena experienced an internal error while executing this query. Please contact AWS support for further assistance. You will not be charged for this query. We apologize for the inconvenience.
Notice that the error only occurs when multiple WHEN are used in the MERGE INTO query! - in case one WHEN is used (just to insert or to delete records) everything works fine, and the table can be used normally.
We can replicate this behaviour on multiple AWS accounts and with different tables/databases/s3 locations.
After trying with different number of columns we consistently found that 321 is the maximum limit for the number of columns of the table. Everything works fine below this threshold.