Best practices for bulk data loading in AWS Redshift - Glue or Copy

What are the pros and cons when it comes to using AWS Glue over Redshift's internal functions (such as COPY and INSERT)? for bulk data loading (In terms of cost, time, and adaptability). It's really appreciated if you can provide some examples use cases.

주제

분석 데이터베이스

태그

AWS Glue 데이터 추출, 변환 및 로드 아마존 Redshift

언어

English

Anushanga Wimalasena

질문됨 10달 전350회 조회

1개 답변

최신
최다 투표
가장 많은 댓글

이 답변이 도움이 되었나요?커뮤니티가 여러분의 지식을 활용할 수 있도록 정답을 찬성하세요.

수락된 답변

Hi, AWS Glue is an ETL service: T is the key letter. If you need to transform the source data before your load into RedShift, Glue will be highly useful.

For example, Glue provides lots of wired in simple and adanced transformations that you can integrate in your Glue-Based ETL pipeline: see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-transforms.html

Also, you may want to measure the quality of your data, before loading it to ensure constant quality. Then AWS Glue Data Quality may be very helpful: see https://aws.amazon.com/blogs/big-data/getting-started-with-aws-glue-data-quality-from-the-aws-glue-data-catalog/

Hope it helps,

Didier

전문가

Didier_Durand

답변함 10달 전

Anushanga Wimalasena
10달 전
Thank you!

Best practices for bulk data loading in AWS Redshift - Glue or Copy

관련 콘텐츠