使用AWS re:Post即您表示您同意 AWS re:Post 使用条款

DynamoDB table purge 400M records from a 1Billion item table & batchwrite throttling

0

Example scenario - DynamoDB table has 1 billion rows company splits, we make a copy of the table for the other company and now we need to delete 400M of the 1B items. TTL nor deleting and recreating the table are options as we need to keep our part of the records. We looked at parallel scan reading the table and pushing rows to delete into a batchwrite operation to delete rows. Is there a better option to consider?

Side note initial tests with parallel scan and batchwrite are having throttling issues with single partition 1000wcu limit table has plenty of wcu capacity. Any other suggested approaches or thoughts to address single partition wcu issue?

table has customerid number PK ; LOBid string SK; and other columns <1KB / item.

AWS
已提问 4 年前1255 查看次数
1 回答
2
已接受的回答

With parallel scan, each thread scans a continuous segment in the table. If the items to be deleted are evenly distributed across the table space, you would encounter table level throttling instead of partition level throttling. The fact that you encountered partition level throttling without exceeding the provisioned WCU indicated that the items to be deleted might reside in a few partitions only. The result is, almost all items in a particular BatchWriteItem API call belong to the same partition, and you might be deleting from only a few partitions at a time.

One way to improve the performance would be perform a shuffling before doing the deletes. That is, the parallel scan threads pushes records to be deleted into a list. After the parallel scan finishes, perform a random ordering on items in the list. After that, the delete threads retrieve items from the list for deletion. With this approach, you increase the possibility that items in a BatchWriteItem API call are distributed in multiple partitions, taking advantage of the write capacity in multiple partitions.

AWS
已回答 4 年前
profile picture
专家
已审核 7 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则