Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog

0

Is there a way to use GlueContext.getSink().writeFrame(...) to write Apache Iceberg tables? So far I only find the version GlueContext.write_dynamic_frame.from_options(...) working which is documented at the bottom of https://aws.amazon.com/de/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/. This version does not seem to provide an option to update the data catalog simultaneously.

gefragt vor 2 Jahren2470 Aufrufe
1 Antwort
0

Hello,

As per the doc there are only two ways to update the schema 1.getSink() and 2.from_catalog() automatically from an AWS Glue Job and your job needs to use the Iceberg connection or Iceberg jars.

getSink() does not support market place connections. Reference

from_catalog() needs to read the metadata like classification or connection from the existing iceberg table. However, if you are creating iceberg tables from Athena as shown here . This method does not work as well. Reference

So, the only way I could see is to use from_options() method and use Spark dataframes to write to your Iceberg table.

Schema evolution for Iceberg tables are documented here and using Athena here

AWS
SUPPORT-TECHNIKER
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen