Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog

0

Is there a way to use GlueContext.getSink().writeFrame(...) to write Apache Iceberg tables? So far I only find the version GlueContext.write_dynamic_frame.from_options(...) working which is documented at the bottom of https://aws.amazon.com/de/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/. This version does not seem to provide an option to update the data catalog simultaneously.

feita há 2 anos2469 visualizações
1 Resposta
0

Hello,

As per the doc there are only two ways to update the schema 1.getSink() and 2.from_catalog() automatically from an AWS Glue Job and your job needs to use the Iceberg connection or Iceberg jars.

getSink() does not support market place connections. Reference

from_catalog() needs to read the metadata like classification or connection from the existing iceberg table. However, if you are creating iceberg tables from Athena as shown here . This method does not work as well. Reference

So, the only way I could see is to use from_options() method and use Spark dataframes to write to your Iceberg table.

Schema evolution for Iceberg tables are documented here and using Athena here

AWS
ENGENHEIRO DE SUPORTE
respondido há 2 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas