AWS Glue load data from S3 to Redshift

0

Hi, I have multiple batching data sources (in JSON format) in S3 and I want to load it into Redshift via AWS Glue. My use case is to parse and join those multiple data sources into a final table, then insert it into a table in Redshift.

My question is:

  1. If I create a workflow in AWS Glue and make it runs once a day, can it continuously update(like insert new rows) to the table in Redshift?
  2. I knew that AWS Glue has a feature that could automatically detect data source schema changes, but what happened when it detects the changes? will it create a new table in Redshift? will I be able to get notification when schema changed?
質問済み 2年前996ビュー
1回答
0

Hello,

Below are the inlined answers for the questions asked

  1. If I create a workflow in AWS Glue and make it runs once a day, can it continuously update(like insert new rows) to the table in Redshift?

---> Yes, the above can be achieved whenever s3 gets updated with the new objects, the new rows in redshift table can be updated continuously by creating a lambda trigger as per the below document [1]

[1] https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-an-etl-service-pipeline-to-load-data-incrementally-from-amazon-s3-to-amazon-redshift-using-aws-glue.html#build-an-etl-service-pipeline-to-load-data-incrementally-from-amazon-s3-to-amazon-redshift-using-aws-glue-prerequisites-and-limitations

  1. I knew that AWS Glue has a feature that could automatically detect data source schema changes, but what happened when it detects the changes? will it create a new table in Redshift? will I be able to get notification when schema changed?

---> When Glue crawler is run over a data set it detects the change in the schema. it will create a new table in Redshift as per the new schema. ---> There will be no notifications on when the schema is changed.

AWS
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ