Glue to automatically create target schema

0

Can someone please clarify if GLUE is supposed to be able to automatically create tables in my target database before loading in the data?

I haven't been able to find anything on the options:

  • Create tables in target
  • Use tables in the data catalog and update your data target

when setting up a job to ETL from say RDS into Redshift.

Since the data has already been catalogued, it would seem the schema should be automatically created but I haven't been successful getting that to work.

Thanks!

已提問 7 年前檢視次數 2462 次
1 個回答
1
已接受的答案
  • Glue does automatically creates the table schema is it doesn't exist. However, any "string" type will be translated into a varchar(65535), which is not efficient for a columnar database.
  • If the table already exists, Glue will just add the records into the table.
  • If Glue can't map a field into the table (example: you have a field "createdate" as string type in your python code and the table has a "createdate" as timestamp type in Redshift), Glue will automatically add a field "createdate_string" to the table and populate that field.

When Glue calls glueContext.write_dynamic_frame.from_jdbc_conf(), it stores the output dataset into the S3 bucket given by: redshift_tmp_dir = "s3://jm-bank/tmp/". Then Glue performs a COPY command to Redshift, which you can visualize in "Queries" option from your Redshift cluster.

已回答 6 年前
profile picture
專家
已審閱 10 個月前
AWS
專家
已審閱 2 年前
  • If the table exists, has all the same columns, but the varchar columns are smaller than varchar(65535) does Glue automatically rebuild the table?

    My tables go back to having varchar(65535) and seem to be missing distribution style/key and sort keys that I added just a couple of days ago.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南