Glue to automatically create target schema

0

Can someone please clarify if GLUE is supposed to be able to automatically create tables in my target database before loading in the data?

I haven't been able to find anything on the options:

  • Create tables in target
  • Use tables in the data catalog and update your data target

when setting up a job to ETL from say RDS into Redshift.

Since the data has already been catalogued, it would seem the schema should be automatically created but I haven't been successful getting that to work.

Thanks!

質問済み 7年前2460ビュー
1回答
1
承認された回答
  • Glue does automatically creates the table schema is it doesn't exist. However, any "string" type will be translated into a varchar(65535), which is not efficient for a columnar database.
  • If the table already exists, Glue will just add the records into the table.
  • If Glue can't map a field into the table (example: you have a field "createdate" as string type in your python code and the table has a "createdate" as timestamp type in Redshift), Glue will automatically add a field "createdate_string" to the table and populate that field.

When Glue calls glueContext.write_dynamic_frame.from_jdbc_conf(), it stores the output dataset into the S3 bucket given by: redshift_tmp_dir = "s3://jm-bank/tmp/". Then Glue performs a COPY command to Redshift, which you can visualize in "Queries" option from your Redshift cluster.

回答済み 6年前
profile picture
エキスパート
レビュー済み 10ヶ月前
AWS
エキスパート
レビュー済み 2年前
  • If the table exists, has all the same columns, but the varchar columns are smaller than varchar(65535) does Glue automatically rebuild the table?

    My tables go back to having varchar(65535) and seem to be missing distribution style/key and sort keys that I added just a couple of days ago.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ