使用 AWS re:Post 即表示您同意 AWS re:Post 使用條款

Create Iceberg table in Glue

0

When I create an Iceberg table in AWS Athena in analogy to the example at the bottom of https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-creating-tables-query-editor, everything works.

When I then call this query in an AWS Glue job via awsglue.context.GlueContext.spark_session.sql(<my query from Athena>), the table gets created, but cannot be properly accessed. Trying to get a preview of the created table in Athena results in the following error:

GENERIC_USER_ERROR: Detected Iceberg type table without metadata location. Please make sure an Iceberg-enabled compute engine such as Athena or EMR Spark is used to create the table, or the table is created by using the Iceberg open source AWS library iceberg-aws. Setting table_type parameter in Glue metastore to create an Iceberg table is not supported.

Is there a way to create an Iceberg table directly in my Glue job? I found https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html#emr-spark-glue-configure, but I am not so sure about creating a new cluster just to be able to create a table, as all other operations on this table, that I need, work in Glue after creating the table manually in Athena.

已提問 1 年前檢視次數 2926 次
2 個答案
1
已接受的答案

I got it working as Athena query in Glue:

database = "my_database"
table = "my_table"
sql_query = "CREATE TABLE ..."
athena_client = boto3.client("athena")
response = athena_client.start_query_execution(
    QueryExecutionContext={"Database": database, "Catalog": table},
    QueryString=sql_query,
)
已回答 1 年前
profile picture
專家
已審閱 7 個月前
profile pictureAWS
支援工程師
已審閱 9 個月前
0

Hello,

It appears the table DDL being used in the question is not natively supported in Glue, hence the error.

To create a iceberg table in Glue, you can try the below:

# Example: Create an Iceberg table from a DataFrame 
# and register the table to Glue Data Catalog

dataFrame.createOrReplaceTempView("tmp_<your_table_name>")

query = f"""
CREATE TABLE glue_catalog.<your_database_name>.<your_table_name>
USING iceberg
AS SELECT * FROM tmp_<your_table_name>
"""
spark.sql(query)

Reference: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html#aws-glue-programming-etl-format-iceberg-write

AWS
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南