- Newest
- Most votes
- Most comments
To me, the generic re:Post Agent’s answer misses the technical root cause. This is a known privilege and architecture conflict in AWS DMS 3.5.4 when transitioning from Table Discovery to the actual initialization phase, specifically under PostgreSQL 15+.
The Root Cause, I assume:
The 90-second timeout and generic Network error (RECOVERABLE_ERROR) is a masked failure.
-
Table Discovery succeeds because the user has basic metadata read permissions.
-
Immediately after discovery, when PluginName=pgoutput is used, DMS attempts to automatically create a PostgreSQL publication behind the scenes (CREATE PUBLICATION awsdms_publication FOR ALL TABLES;).
-
Starting with PostgreSQL 15+ (including your 16.11 version), executing FOR ALL TABLES strictly requires superuser privileges or the pg_create_subscription role.
-
Because your production user lacks rds_superuser, this internal SQL statement fails silently. The replication stream terminates abruptly, causing DMS to drop the connection socket and misreport it as a generic network error. Setting CaptureDdls=false only prevents the creation of the DDL audit table/triggers—it does not bypass the publication creation requirements of pgoutput.
Try the following:
To fix this without granting rds_superuser in production, you must pre-create the publication manually and instruct DMS to use it instead of generating one.
Step 1: Manually create the Publication (As Admin/Superuser)
Log into your Aurora PostgreSQL instance with a superuser account and create a publication explicitly for your target tables:
CREATE PUBLICATION my_prod_dms_publication FOR TABLE schema.table1, schema.table2, ...;
(Note: Ensure your DMS migration user has SELECT privileges on these tables and is granted membership or ownership access where required by PG16 replication rules).
Step 2: Update Source Endpoint Extra Connection Attributes (ECA)
Modify your PostgreSQL source endpoint configuration to reference the manually created publication. Append the following attributes:
PluginName=pgoutput;CaptureDdls=false;PublicationName=my_prod_dms_publication;ExposeCurrentTransactionsAsRecords=true;
Step 3: (Optional) Expose Detailed Error Logs
If you ever need to see the exact failing SQL or HTTP payloads behind a generic "Network error", switch the component logging to DETAILED_DEBUG in your Task Settings JSON:
"ComponentAppSettings": [ { "ComponentId": "SOURCE_UNLOAD", "Severity": "LOGGER_SEVERITY_DETAILED_DEBUG" }, { "ComponentId": "SOURCE_CAPTURE", "Severity": "LOGGER_SEVERITY_DETAILED_DEBUG" }, { "ComponentId": "TARGET_LOAD", "Severity": "LOGGER_SEVERITY_DETAILED_DEBUG" } ]
Restart or Resume your task after modifying the ECA attributes, and the full-load-and-cdc process to OpenSearch will proceed normally.
Based on the troubleshooting information available for AWS DMS, here are some insights into your issue:
Regarding CaptureDDLs Configuration: The CaptureDDLs setting is primarily documented for PostgreSQL target endpoints to prevent capturing DDL statements. While you're using it on a source endpoint, this configuration's interaction with the pgoutput plugin and OpenSearch targets may have undocumented behaviors in DMS 3.5.4.
Troubleshooting Steps to Consider:
-
Replication Instance Capacity: Verify your dms.r5.4xlarge instance has sufficient capacity for the task. Out-of-memory errors can manifest as network errors. Review CloudWatch metrics for memory utilization, CPU, and network throughput during the failure window.
-
CloudWatch Log Analysis: Search your CloudWatch logs specifically for error and warning patterns using "]E:" or "]W:" in the search bar. This can reveal more detailed error messages that aren't immediately visible in the task summary.
-
Component-Level Error Identification: Check which DMS components are logging the errors. If the errors are from SOURCE_UNLOAD or SOURCE_CAPTURE components, the issue likely originates from the PostgreSQL source configuration. If they're from TARGET_LOAD or TARGET_APPLY, focus on the OpenSearch target configuration.
-
Endpoint Configuration Verification: Confirm that both your PostgreSQL source and OpenSearch target endpoints are correctly configured according to their specific requirements. Engine-specific limitations, unsupported data types, and CDC restrictions can cause task failures that may appear as network errors.
-
Sequential vs. Concurrent Execution: If there are resource constraints, try running tasks sequentially rather than concurrently to reduce the load on the replication instance.
Regarding Your Specific Questions:
The interaction between CaptureDDLs=false, pgoutput plugin, and OpenSearch targets isn't explicitly documented in the available troubleshooting guidance. The fact that removing this setting (with a superuser) resolves the issue suggests there may be an incompatibility or unexpected behavior when these specific configurations are combined.
Consider opening an AWS Support case with your detailed logs and configuration, as this appears to be a specific edge case that may require deeper investigation into the DMS engine's behavior with this particular combination of settings.
Sources
Troubleshooting migration tasks in AWS Database Migration Service - AWS Database Migration Service
Troubleshoot AWS DMS replication task failures and stuck states | AWS re:Post
Relevant content
- asked 8 months ago
- AWS OFFICIALUpdated 3 years ago
