Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog

0

Is there a way to use GlueContext.getSink().writeFrame(...) to write Apache Iceberg tables? So far I only find the version GlueContext.write_dynamic_frame.from_options(...) working which is documented at the bottom of https://aws.amazon.com/de/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/. This version does not seem to provide an option to update the data catalog simultaneously.

質問済み 2年前2471ビュー
1回答
0

Hello,

As per the doc there are only two ways to update the schema 1.getSink() and 2.from_catalog() automatically from an AWS Glue Job and your job needs to use the Iceberg connection or Iceberg jars.

getSink() does not support market place connections. Reference

from_catalog() needs to read the metadata like classification or connection from the existing iceberg table. However, if you are creating iceberg tables from Athena as shown here . This method does not work as well. Reference

So, the only way I could see is to use from_options() method and use Spark dataframes to write to your Iceberg table.

Schema evolution for Iceberg tables are documented here and using Athena here

AWS
サポートエンジニア
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ