AWS Glue Error - File already Exists

0

I am doing a AWS Glue job to read from Redshift (schema_1) and write it back to Redhshift (schema_2). This process is done using below:

Redshift_read = glueContext.create_dynamic_frame.from_options(
    connection_type="redshift",
    connection_options={
        "sampleQuery": sample_query,
        "redshiftTmpDir": tmp_dir,
        "useConnectionProperties": "true",
        "connectionName": "dodsprd_connection",
        "sse_kms_key" : "abc-fhrt-2345-8663",
    },
    transformation_ctx="Redshift_read",
)

Redshift_write = glueContext.write_dynamic_frame.from_jdbc_conf(
    frame=Redshift_read,
    catalog_connection="dodsprd_connection",
    connection_options={
        "database": "dodsprod",
        "dbtable": "dw_replatform_stage.rt_order_line_cancellations_1",
        "preactions": pre_query,
        "postactions": post_query,
    },
    redshift_tmp_dir=tmp_dir,
    transformation_ctx="Redshift_write",
)

The "sample_query" is a normal SQL query with some business logic. When i run this glue job, I am getting below error:

An error occurred while calling o106.pyWriteDynamicFrame. File already exists:

When running the same SQL query manually in SQL workbench, I am getting a proper output. Can anyone please help me on this.

1개 답변
0

Since you are using Redshift, I suspect the error comes from the temporary files used to do COPY/UNLOAD, check the stacktrace to confirm.
A quick solution could be giving them different temporary subpaths.

profile pictureAWS
전문가
답변함 3달 전
  • Can you please tell me, how to check the stack trace.?

    I also tried giving temp subpaths as below: Old path: s3://bjs-digital-dods-data-lake-processed-prod/temp/fact_order_line_returns/ New path: s3://bjs-digital-dods-data-lake-processed-prod/temp/temp_test/fact_order_line_returns/ Still, I get the same error.

  • The full error stack trace (including both Python and Scala) will be in the error log

  • I contacted AWS Support and we worked together in this issue. But not able to fix. The query I am using through this logic is very huge. For some reason, if the query is very huge, I ma getting this error. Support team told that, internally some node failure is happening when the data is COPIED from S3 to redshift. This failure is displayed as the error I told you. But till now, I am did not get any solution. But found a workaround. I am writing the output to an S3 file, and Lambda will take care of loading into Redshift through an event trigger. This is working smooth.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인