Using Glue for dimensional model ETL into Redshift

0

A customer is wondering if they can use Glue for their dimensional model ETL. Would it be able to populate the dimensions and facts and load it into Redshift, or would they need to create a staging table in Redshift and then populate their dimensions and facts via querying with surrogate keys?

I don't see why Glue wouldn't work for a dimensional model schema, but I'm having a really hard time finding sources and information about it.

AWS
질문됨 4년 전777회 조회
1개 답변
0
수락된 답변

Glue can definitely be used for loading dimensional data into Redshift. Approach will depend on what kind of dimension it is (SCD Type). And you can certainly generate surrogate ids in Glue. Example: I have used this in the past.

def customer_id(custid):
    x = int(str(hashlib.md5(custid.encode()).hexdigest()[:10]),16)
    x = int(x)
    return x

However, make sure you follow the logic consistently across different datasets to produce consistent surrogate ids.

Ideally, a staging table should be present and from staging to main table, you can govern the logic through redshift procedure or plain SQL depending upon the complexity.

AWS
답변함 4년 전
profile picture
전문가
검토됨 24일 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠