Background
We are trying to migrate a relatively large table with 200+ million rows from the Aurora PostgreSQL source to the DynamoDB target. While reading the guide, "Using an Amazon DynamoDB database as a target for AWS Database Migration Service" I stumbled upon this,
Note DMS assigns each segment of a table to its own thread for loading. Therefore, set ParallelLoadThreads to the maximum number of segments that you specify for a table in the source.
And,
Table-mapping settings for individual tables – Use table-settings rules to identify individual tables from the source that you want to load in parallel. Also, use these rules to specify how to segment the rows of each table for multithreaded loading. For more information, see Table and collection settings rules and operations.
Error
Looking at the guide, I have defined the parallel load for the task (moved the terraform configuration to the end of the post). But trying to apply those changes fails,
│ Error: updating DMS Replication Task (messaging-migrate-db-subscriptions-dms-full-load-and-cdc): InvalidParameterValueException: Error in mapping rules. Rule with ruleId = 3 failed validation. Table setting 'parallel-load' is not supported for the target endpoint type 'dynamodb'
So I think there are two options here:
- Guide is misleading because it is not possible to define segmentation using rules for DynamoDB target.
- There is another way to define segmentation rules that I am not aware of.
Thanks!
Attachments
{
rule-type = "table-settings"
rule-id = 3
rule-name = "ParallelLoadSettings"
object-locator = {
schema-name = "***"
table-name = "***"
}
type = "ranges"
columns = [
"id"
]
boundaries = [
["10000000-00000000-00000000-00000000"],
["20000000-00000000-00000000-00000000"],
["30000000-00000000-00000000-00000000"],
["40000000-00000000-00000000-00000000"],
["50000000-00000000-00000000-00000000"],
["60000000-00000000-00000000-00000000"],
["70000000-00000000-00000000-00000000"],
["80000000-00000000-00000000-00000000"],
["90000000-00000000-00000000-00000000"],
["a0000000-00000000-00000000-00000000"],
["b0000000-00000000-00000000-00000000"],
["c0000000-00000000-00000000-00000000"],
["d0000000-00000000-00000000-00000000"],
["e0000000-00000000-00000000-00000000"],
["f0000000-00000000-00000000-00000000"],
]
}
As far as I understand this statement from the official guide is incorrect:
Can we update the page so it is clear how Dynamodb parallelization works? It seems like
ParallelLoadThreads
automatically scales the full-load process without writing any table setting.Best way to have the documentations updated is to provide feedback directly on the page of concern (bottom left). It will cut a ticket to the owning team. Thanks.