Glue incremental load

0

I am loading the data from Amazon RDS(mysql database) to Redshift using AWS Glue ETL and data catolgue. But I can't figure out how to do incremental loading(upsert)? Is there a way to create a filter/parameter on date while reading from source database to load only yesterday's data without using bookmarks?

2개 답변
0

Using Bookmarks is recommended way when using Glue, however nothing stops you from passing a parameter to glue script [https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html] and then generaly the query dynamically based on that parameter if you want.

The other question is why you wanted to use Glue and not Redshift federated query instead which might be an easier option?

AWS
Alex_T
답변함 2년 전
  • Cause i use Redshift Serverless which still have no Query schedule , so even if i used federated query to run MV i will have to schedule using step function to be able to refresh and monitor extraction workflow.

    the issue of using the bookmark that i need only the data that is loaded yesterday and by default running the jobs after 12 AM will cause some unneeded data to be loaded by using bookmark which might affect my extraction logic

0

One option is to use DMS to incrementally load Redshift using MySQL as a source. Here's a whitepaper on the topic: https://docs.aws.amazon.com/whitepapers/latest/optimizing-dms-with-amazon-redshift/optimizing-dms-with-amazon-redshift.html

profile pictureAWS
전문가
Rajiv_G
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠