Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog

0

Is there a way to use GlueContext.getSink().writeFrame(...) to write Apache Iceberg tables? So far I only find the version GlueContext.write_dynamic_frame.from_options(...) working which is documented at the bottom of https://aws.amazon.com/de/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/. This version does not seem to provide an option to update the data catalog simultaneously.

preguntada hace 2 años2470 visualizaciones
1 Respuesta
0

Hello,

As per the doc there are only two ways to update the schema 1.getSink() and 2.from_catalog() automatically from an AWS Glue Job and your job needs to use the Iceberg connection or Iceberg jars.

getSink() does not support market place connections. Reference

from_catalog() needs to read the metadata like classification or connection from the existing iceberg table. However, if you are creating iceberg tables from Athena as shown here . This method does not work as well. Reference

So, the only way I could see is to use from_options() method and use Spark dataframes to write to your Iceberg table.

Schema evolution for Iceberg tables are documented here and using Athena here

AWS
INGENIERO DE SOPORTE
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas