Create Iceberg table in Glue

0

When I create an Iceberg table in AWS Athena in analogy to the example at the bottom of https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-creating-tables-query-editor, everything works.

When I then call this query in an AWS Glue job via awsglue.context.GlueContext.spark_session.sql(<my query from Athena>), the table gets created, but cannot be properly accessed. Trying to get a preview of the created table in Athena results in the following error:

GENERIC_USER_ERROR: Detected Iceberg type table without metadata location. Please make sure an Iceberg-enabled compute engine such as Athena or EMR Spark is used to create the table, or the table is created by using the Iceberg open source AWS library iceberg-aws. Setting table_type parameter in Glue metastore to create an Iceberg table is not supported.

Is there a way to create an Iceberg table directly in my Glue job? I found https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html#emr-spark-glue-configure, but I am not so sure about creating a new cluster just to be able to create a table, as all other operations on this table, that I need, work in Glue after creating the table manually in Athena.

질문됨 9달 전1358회 조회
2개 답변
0
수락된 답변

I got it working as Athena query in Glue:

database = "my_database"
table = "my_table"
sql_query = "CREATE TABLE ..."
athena_client = boto3.client("athena")
response = athena_client.start_query_execution(
    QueryExecutionContext={"Database": database, "Catalog": table},
    QueryString=sql_query,
)
답변함 9달 전
AWS
지원 엔지니어
검토됨 2달 전
0

Hello,

It appears the table DDL being used in the question is not natively supported in Glue, hence the error.

To create a iceberg table in Glue, you can try the below:

# Example: Create an Iceberg table from a DataFrame 
# and register the table to Glue Data Catalog

dataFrame.createOrReplaceTempView("tmp_<your_table_name>")

query = f"""
CREATE TABLE glue_catalog.<your_database_name>.<your_table_name>
USING iceberg
AS SELECT * FROM tmp_<your_table_name>
"""
spark.sql(query)

Reference: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html#aws-glue-programming-etl-format-iceberg-write

AWS
답변함 9달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠