Glue to automatically create target schema

0

Can someone please clarify if GLUE is supposed to be able to automatically create tables in my target database before loading in the data?

I haven't been able to find anything on the options:

  • Create tables in target
  • Use tables in the data catalog and update your data target

when setting up a job to ETL from say RDS into Redshift.

Since the data has already been catalogued, it would seem the schema should be automatically created but I haven't been successful getting that to work.

Thanks!

已提问 7 年前2460 查看次数
1 回答
1
已接受的回答
  • Glue does automatically creates the table schema is it doesn't exist. However, any "string" type will be translated into a varchar(65535), which is not efficient for a columnar database.
  • If the table already exists, Glue will just add the records into the table.
  • If Glue can't map a field into the table (example: you have a field "createdate" as string type in your python code and the table has a "createdate" as timestamp type in Redshift), Glue will automatically add a field "createdate_string" to the table and populate that field.

When Glue calls glueContext.write_dynamic_frame.from_jdbc_conf(), it stores the output dataset into the S3 bucket given by: redshift_tmp_dir = "s3://jm-bank/tmp/". Then Glue performs a COPY command to Redshift, which you can visualize in "Queries" option from your Redshift cluster.

已回答 6 年前
profile picture
专家
已审核 10 个月前
AWS
专家
已审核 2 年前
  • If the table exists, has all the same columns, but the varchar columns are smaller than varchar(65535) does Glue automatically rebuild the table?

    My tables go back to having varchar(65535) and seem to be missing distribution style/key and sort keys that I added just a couple of days ago.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则