Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog

0

Is there a way to use GlueContext.getSink().writeFrame(...) to write Apache Iceberg tables? So far I only find the version GlueContext.write_dynamic_frame.from_options(...) working which is documented at the bottom of https://aws.amazon.com/de/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/. This version does not seem to provide an option to update the data catalog simultaneously.

已提问 2 年前2470 查看次数
1 回答
0

Hello,

As per the doc there are only two ways to update the schema 1.getSink() and 2.from_catalog() automatically from an AWS Glue Job and your job needs to use the Iceberg connection or Iceberg jars.

getSink() does not support market place connections. Reference

from_catalog() needs to read the metadata like classification or connection from the existing iceberg table. However, if you are creating iceberg tables from Athena as shown here . This method does not work as well. Reference

So, the only way I could see is to use from_options() method and use Spark dataframes to write to your Iceberg table.

Schema evolution for Iceberg tables are documented here and using Athena here

AWS
支持工程师
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则