AWS Glue load data from S3 to Redshift

0

Hi, I have multiple batching data sources (in JSON format) in S3 and I want to load it into Redshift via AWS Glue. My use case is to parse and join those multiple data sources into a final table, then insert it into a table in Redshift.

My question is:

  1. If I create a workflow in AWS Glue and make it runs once a day, can it continuously update(like insert new rows) to the table in Redshift?
  2. I knew that AWS Glue has a feature that could automatically detect data source schema changes, but what happened when it detects the changes? will it create a new table in Redshift? will I be able to get notification when schema changed?
1개 답변
0

Hello,

Below are the inlined answers for the questions asked

  1. If I create a workflow in AWS Glue and make it runs once a day, can it continuously update(like insert new rows) to the table in Redshift?

---> Yes, the above can be achieved whenever s3 gets updated with the new objects, the new rows in redshift table can be updated continuously by creating a lambda trigger as per the below document [1]

[1] https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-an-etl-service-pipeline-to-load-data-incrementally-from-amazon-s3-to-amazon-redshift-using-aws-glue.html#build-an-etl-service-pipeline-to-load-data-incrementally-from-amazon-s3-to-amazon-redshift-using-aws-glue-prerequisites-and-limitations

  1. I knew that AWS Glue has a feature that could automatically detect data source schema changes, but what happened when it detects the changes? will it create a new table in Redshift? will I be able to get notification when schema changed?

---> When Glue crawler is run over a data set it detects the change in the schema. it will create a new table in Redshift as per the new schema. ---> There will be no notifications on when the schema is changed.

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인