Scenario:
Source table: Glue Data Catalog table study crawled from MySQL with columns:
- id (int),
- code (varchar),
- desc (varchar)
- and 2 other columns not used in the job.
Target table: Amazon Redshift table dim_study with columns:
- id_pk (int; identity, primary key),
- original_id (int),
- code (varchar),
- desc (varchar),
- inserted_on (timestamp default getdate()),
- updated_on (timestamp default null)
After setting up the Data Source, I added a Change Schema step to map the source and target columns (id to original_id, code to code and desc to desc), change names, drop those not used. The data preview looks fine, it only displays 3 columns, with the target names as headers.
Then I set up the target node, Redshift, pointing to the dimension table.
Access type: Direct connection.
Handling method: MERGE data into target table.
Choose keys and simple actions: Matching keys original_id (as in target)
Set up performance and security.
When I run the job, it throws this ERROR: column "original_id" is of type integer but expression is of type character varying. Hint: You will need to rewrite or cast the expression.
Note: If I change the handling method to APPEND, it works fine, BUT it doesn't fill the inserted_on column, which has a default GETDATE() value.
So I need to solve 2 issues: what is causing the type mismatch, and why the timestamp column is filling up with NULL values instead of current datetime (I suspect because the data source has only 3 columns).