Using Glue for dimensional model ETL into Redshift

0

A customer is wondering if they can use Glue for their dimensional model ETL. Would it be able to populate the dimensions and facts and load it into Redshift, or would they need to create a staging table in Redshift and then populate their dimensions and facts via querying with surrogate keys?

I don't see why Glue wouldn't work for a dimensional model schema, but I'm having a really hard time finding sources and information about it.

AWS
質問済み 4年前777ビュー
1回答
0
承認された回答

Glue can definitely be used for loading dimensional data into Redshift. Approach will depend on what kind of dimension it is (SCD Type). And you can certainly generate surrogate ids in Glue. Example: I have used this in the past.

def customer_id(custid):
    x = int(str(hashlib.md5(custid.encode()).hexdigest()[:10]),16)
    x = int(x)
    return x

However, make sure you follow the logic consistently across different datasets to produce consistent surrogate ids.

Ideally, a staging table should be present and from staging to main table, you can govern the logic through redshift procedure or plain SQL depending upon the complexity.

AWS
回答済み 4年前
profile picture
エキスパート
レビュー済み 23日前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン