AWS Glue Error - File already Exists

0

I am doing a AWS Glue job to read from Redshift (schema_1) and write it back to Redhshift (schema_2). This process is done using below:

Redshift_read = glueContext.create_dynamic_frame.from_options(
    connection_type="redshift",
    connection_options={
        "sampleQuery": sample_query,
        "redshiftTmpDir": tmp_dir,
        "useConnectionProperties": "true",
        "connectionName": "dodsprd_connection",
        "sse_kms_key" : "abc-fhrt-2345-8663",
    },
    transformation_ctx="Redshift_read",
)

Redshift_write = glueContext.write_dynamic_frame.from_jdbc_conf(
    frame=Redshift_read,
    catalog_connection="dodsprd_connection",
    connection_options={
        "database": "dodsprod",
        "dbtable": "dw_replatform_stage.rt_order_line_cancellations_1",
        "preactions": pre_query,
        "postactions": post_query,
    },
    redshift_tmp_dir=tmp_dir,
    transformation_ctx="Redshift_write",
)

The "sample_query" is a normal SQL query with some business logic. When i run this glue job, I am getting below error:

An error occurred while calling o106.pyWriteDynamicFrame. File already exists:

When running the same SQL query manually in SQL workbench, I am getting a proper output. Can anyone please help me on this.

Joe
posta 3 mesi fa430 visualizzazioni
1 Risposta
0

Since you are using Redshift, I suspect the error comes from the temporary files used to do COPY/UNLOAD, check the stacktrace to confirm.
A quick solution could be giving them different temporary subpaths.

profile pictureAWS
ESPERTO
con risposta 3 mesi fa
  • Can you please tell me, how to check the stack trace.?

    I also tried giving temp subpaths as below: Old path: s3://bjs-digital-dods-data-lake-processed-prod/temp/fact_order_line_returns/ New path: s3://bjs-digital-dods-data-lake-processed-prod/temp/temp_test/fact_order_line_returns/ Still, I get the same error.

  • The full error stack trace (including both Python and Scala) will be in the error log

  • I contacted AWS Support and we worked together in this issue. But not able to fix. The query I am using through this logic is very huge. For some reason, if the query is very huge, I ma getting this error. Support team told that, internally some node failure is happening when the data is COPIED from S3 to redshift. This failure is displayed as the error I told you. But till now, I am did not get any solution. But found a workaround. I am writing the output to an S3 file, and Lambda will take care of loading into Redshift through an event trigger. This is working smooth.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande