Create Iceberg table in Glue

0

When I create an Iceberg table in AWS Athena in analogy to the example at the bottom of https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-creating-tables-query-editor, everything works.

When I then call this query in an AWS Glue job via awsglue.context.GlueContext.spark_session.sql(<my query from Athena>), the table gets created, but cannot be properly accessed. Trying to get a preview of the created table in Athena results in the following error:

GENERIC_USER_ERROR: Detected Iceberg type table without metadata location. Please make sure an Iceberg-enabled compute engine such as Athena or EMR Spark is used to create the table, or the table is created by using the Iceberg open source AWS library iceberg-aws. Setting table_type parameter in Glue metastore to create an Iceberg table is not supported.

Is there a way to create an Iceberg table directly in my Glue job? I found https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html#emr-spark-glue-configure, but I am not so sure about creating a new cluster just to be able to create a table, as all other operations on this table, that I need, work in Glue after creating the table manually in Athena.

asked 9 months ago1262 views
2 Answers
0
Accepted Answer

I got it working as Athena query in Glue:

database = "my_database"
table = "my_table"
sql_query = "CREATE TABLE ..."
athena_client = boto3.client("athena")
response = athena_client.start_query_execution(
    QueryExecutionContext={"Database": database, "Catalog": table},
    QueryString=sql_query,
)
answered 9 months ago
AWS
SUPPORT ENGINEER
reviewed a month ago
0

Hello,

It appears the table DDL being used in the question is not natively supported in Glue, hence the error.

To create a iceberg table in Glue, you can try the below:

# Example: Create an Iceberg table from a DataFrame 
# and register the table to Glue Data Catalog

dataFrame.createOrReplaceTempView("tmp_<your_table_name>")

query = f"""
CREATE TABLE glue_catalog.<your_database_name>.<your_table_name>
USING iceberg
AS SELECT * FROM tmp_<your_table_name>
"""
spark.sql(query)

Reference: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html#aws-glue-programming-etl-format-iceberg-write

AWS
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions