Guidance Needed for Migrating Data from Cosmos DB to Snowflake with CDC: AWS Glue

0

We have data stored in Cosmos DB NoSQL and need to migrate it to Snowflake using AWS Glue with a Change Data Capture (CDC) approach.

Our objective is to perform CRUD operations based on CDC to handle all data changes.

This is our initial plan, but we are seeking advice on a more cost-effective and efficient solution. Please provide guidance on best practices for implementing CDC with AWS Glue, or suggest alternative methods within AWS that might offer better cost management or efficiency for our ETL process.

1 Answer
0

Hello

This third-party documentation indicates that Azure functions can be used to connect to a change feed and trigger on change events. https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/change-feed-functions

You can use the Azure Functions to feed data into an AWS managed streaming service such and Amazon Managed Kafka or Kinesis which Glue natively integrates with Glue Streaming jobs. https://docs.aws.amazon.com/glue/latest/dg/streaming-chapter.html

Glue supports connections to Snowflake using Glue Connections: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-snowflake-home.html

Your cost considerations on the AWS side will be related to your Glue resources used, as well as your data retention and streaming configurations. You cost optimizations with this recommendation with be: Cost Optimizations in Glue: Use native Spark functions, Spark SQL, or Glue Spark functions as much as possible to leverage the Catalyst Optimizer in Spark for efficient jobs. If User Defined Functions are required, where possible use Java User Defined Functions rather than Python User Defined Functions. Cost Optimizations in Streaming: Generally, Kinesis will be cheaper, but the exact pricing will depend on your streaming configurations (e.g.: data retention, etc.)

Thank you for considering AWS Glue for your ETL needs!

AWS
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions