Glue to automatically create target schema

0

Can someone please clarify if GLUE is supposed to be able to automatically create tables in my target database before loading in the data?

I haven't been able to find anything on the options:

  • Create tables in target
  • Use tables in the data catalog and update your data target

when setting up a job to ETL from say RDS into Redshift.

Since the data has already been catalogued, it would seem the schema should be automatically created but I haven't been successful getting that to work.

Thanks!

질문됨 7년 전2460회 조회
1개 답변
1
수락된 답변
  • Glue does automatically creates the table schema is it doesn't exist. However, any "string" type will be translated into a varchar(65535), which is not efficient for a columnar database.
  • If the table already exists, Glue will just add the records into the table.
  • If Glue can't map a field into the table (example: you have a field "createdate" as string type in your python code and the table has a "createdate" as timestamp type in Redshift), Glue will automatically add a field "createdate_string" to the table and populate that field.

When Glue calls glueContext.write_dynamic_frame.from_jdbc_conf(), it stores the output dataset into the S3 bucket given by: redshift_tmp_dir = "s3://jm-bank/tmp/". Then Glue performs a COPY command to Redshift, which you can visualize in "Queries" option from your Redshift cluster.

답변함 6년 전
profile picture
전문가
검토됨 10달 전
AWS
전문가
검토됨 2년 전
  • If the table exists, has all the same columns, but the varchar columns are smaller than varchar(65535) does Glue automatically rebuild the table?

    My tables go back to having varchar(65535) and seem to be missing distribution style/key and sort keys that I added just a couple of days ago.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠