I am doing a AWS Glue job to read from Redshift (schema_1) and write it back to Redhshift (schema_2). This process is done using below:
Redshift_read = glueContext.create_dynamic_frame.from_options(
connection_type="redshift",
connection_options={
"sampleQuery": sample_query,
"redshiftTmpDir": tmp_dir,
"useConnectionProperties": "true",
"connectionName": "dodsprd_connection",
"sse_kms_key" : "abc-fhrt-2345-8663",
},
transformation_ctx="Redshift_read",
)
Redshift_write = glueContext.write_dynamic_frame.from_jdbc_conf(
frame=Redshift_read,
catalog_connection="dodsprd_connection",
connection_options={
"database": "dodsprod",
"dbtable": "dw_replatform_stage.rt_order_line_cancellations_1",
"preactions": pre_query,
"postactions": post_query,
},
redshift_tmp_dir=tmp_dir,
transformation_ctx="Redshift_write",
)
The "sample_query" is a normal SQL query with some business logic. When i run this glue job, I am getting below error:
An error occurred while calling o106.pyWriteDynamicFrame. File already exists:
When running the same SQL query manually in SQL workbench, I am getting a proper output.
Can anyone please help me on this.
Can you please tell me, how to check the stack trace.?
I also tried giving temp subpaths as below: Old path: s3://bjs-digital-dods-data-lake-processed-prod/temp/fact_order_line_returns/ New path: s3://bjs-digital-dods-data-lake-processed-prod/temp/temp_test/fact_order_line_returns/ Still, I get the same error.
The full error stack trace (including both Python and Scala) will be in the error log
I contacted AWS Support and we worked together in this issue. But not able to fix. The query I am using through this logic is very huge. For some reason, if the query is very huge, I ma getting this error. Support team told that, internally some node failure is happening when the data is COPIED from S3 to redshift. This failure is displayed as the error I told you. But till now, I am did not get any solution. But found a workaround. I am writing the output to an S3 file, and Lambda will take care of loading into Redshift through an event trigger. This is working smooth.