Create Iceberg table in Glue

0

When I create an Iceberg table in AWS Athena in analogy to the example at the bottom of https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-creating-tables-query-editor, everything works.

When I then call this query in an AWS Glue job via awsglue.context.GlueContext.spark_session.sql(<my query from Athena>), the table gets created, but cannot be properly accessed. Trying to get a preview of the created table in Athena results in the following error:

GENERIC_USER_ERROR: Detected Iceberg type table without metadata location. Please make sure an Iceberg-enabled compute engine such as Athena or EMR Spark is used to create the table, or the table is created by using the Iceberg open source AWS library iceberg-aws. Setting table_type parameter in Glue metastore to create an Iceberg table is not supported.

Is there a way to create an Iceberg table directly in my Glue job? I found https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html#emr-spark-glue-configure, but I am not so sure about creating a new cluster just to be able to create a table, as all other operations on this table, that I need, work in Glue after creating the table manually in Athena.

gefragt vor einem Jahr2971 Aufrufe
2 Antworten
1
Akzeptierte Antwort

I got it working as Athena query in Glue:

database = "my_database"
table = "my_table"
sql_query = "CREATE TABLE ..."
athena_client = boto3.client("athena")
response = athena_client.start_query_execution(
    QueryExecutionContext={"Database": database, "Catalog": table},
    QueryString=sql_query,
)
beantwortet vor einem Jahr
profile picture
EXPERTE
überprüft vor 7 Monaten
profile pictureAWS
SUPPORT-TECHNIKER
überprüft vor 9 Monaten
0

Hello,

It appears the table DDL being used in the question is not natively supported in Glue, hence the error.

To create a iceberg table in Glue, you can try the below:

# Example: Create an Iceberg table from a DataFrame 
# and register the table to Glue Data Catalog

dataFrame.createOrReplaceTempView("tmp_<your_table_name>")

query = f"""
CREATE TABLE glue_catalog.<your_database_name>.<your_table_name>
USING iceberg
AS SELECT * FROM tmp_<your_table_name>
"""
spark.sql(query)

Reference: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html#aws-glue-programming-etl-format-iceberg-write

AWS
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen