I have a large postgres table which is having issues replicating on DMS v3.5.1. The same task (and endpoints) work fine with v3.4.7, so I have rolled back to that for now. The job is a full load & ongoing replication of changes.
- Destinaton: Redshift
- Source: Postgres 12.x
- 700M+ rows
- Rows 128b to 1Kb, average ~256b
- All other tables are fine, one has more records but is lighter
The error is not particularly helpful, but is likely accurate - session timeout. Are there any known issues with v3.5.1 and old postgres, or any debugging tips? I have an identical task that works, but that database is 1/5th the size for now, and PG 15.2.
I have attempted changing the LoadTimeout
variable on the Redshift endpoint to 12 hours (currently takes ~5.5 hours to replicate), to no noticeable effect. It still crashed after 4.5 hours.
[TARGET_LOAD ]E: RetCode: SQL_ERROR SqlState: 57P01 NativeError: 30 Message:
[Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query:
[SQLState 57P01] FATAL: terminating connection due to session timeout [1022500] (ar_odbc_stmt.c:5007)
[TARGET_LOAD ]E: Failed to load public.table_name from S3, file name: LOAD0000123E.csv [1022509] (cloud_imp.c:2518)
[TARGET_LOAD ]E: Failed to copy data of file /rdsdbdata/data/tasks/DK.../cloud/23/LOAD0000123E.csv
[PERFORMANCE ]I: End load handler time for public.table_name = 480312 microseconds (endpointshell.c:2992)
[PERFORMANCE ]I: Total load time for public.table_name = 13947613253 microseconds (endpointshell.c:2994)
...
[TASK_MANAGER ]W: Table 'public'.'table_name' was errored/suspended (subtask 1 thread 1).
Failed (retcode -1) to execute statement; RetCode: SQL_ERROR SqlState: 57P01 NativeError: 30
Message: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query:
[SQLState 57P01] FATAL: terminating connection due to session timeout ;
Failed to load public.table_name from S3, file name: LOAD0000123E.csv;
Failed to copy data of file /rdsdbdata/data/tasks/DK.../cloud/23/LOAD0000123E.csv to database;
Handling End of table 'public'.'table_name' loading failed by subtask 1 thread 1 (replicationtask.c:3023)
Thanks for the response! I had not considered checking the PG logs in RDS as I figured it was a Redshift/Client Library issue. Depending on time I may spin up a new Redshift and try it again to see if I can catch the actual errors and report them properly. Retention was low so I do not have the PG logs covering that time window.