使用AWS re:Post即您表示您同意 AWS re:Post 使用条款

Create Iceberg table in Glue

0

When I create an Iceberg table in AWS Athena in analogy to the example at the bottom of https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-creating-tables-query-editor, everything works.

When I then call this query in an AWS Glue job via awsglue.context.GlueContext.spark_session.sql(<my query from Athena>), the table gets created, but cannot be properly accessed. Trying to get a preview of the created table in Athena results in the following error:

GENERIC_USER_ERROR: Detected Iceberg type table without metadata location. Please make sure an Iceberg-enabled compute engine such as Athena or EMR Spark is used to create the table, or the table is created by using the Iceberg open source AWS library iceberg-aws. Setting table_type parameter in Glue metastore to create an Iceberg table is not supported.

Is there a way to create an Iceberg table directly in my Glue job? I found https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html#emr-spark-glue-configure, but I am not so sure about creating a new cluster just to be able to create a table, as all other operations on this table, that I need, work in Glue after creating the table manually in Athena.

已提问 1 年前2964 查看次数
2 回答
1
已接受的回答

I got it working as Athena query in Glue:

database = "my_database"
table = "my_table"
sql_query = "CREATE TABLE ..."
athena_client = boto3.client("athena")
response = athena_client.start_query_execution(
    QueryExecutionContext={"Database": database, "Catalog": table},
    QueryString=sql_query,
)
已回答 1 年前
profile picture
专家
已审核 7 个月前
profile pictureAWS
支持工程师
已审核 9 个月前
0

Hello,

It appears the table DDL being used in the question is not natively supported in Glue, hence the error.

To create a iceberg table in Glue, you can try the below:

# Example: Create an Iceberg table from a DataFrame 
# and register the table to Glue Data Catalog

dataFrame.createOrReplaceTempView("tmp_<your_table_name>")

query = f"""
CREATE TABLE glue_catalog.<your_database_name>.<your_table_name>
USING iceberg
AS SELECT * FROM tmp_<your_table_name>
"""
spark.sql(query)

Reference: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html#aws-glue-programming-etl-format-iceberg-write

AWS
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则