How do I streamline the creation of a global secondary index for a DynamoDB table?

4 minute read
8

I want to create a global secondary index for an Amazon DynamoDB table, but it's taking a long time.

Short description

The time that's required to add a global secondary index to a DynamoDB table depends on the following factors:

  • The size of the base table
  • The number of items in the table that qualify for inclusion in the index
  • The size of the item that is going to be projected in the index
  • The number of attributes projected into the index
  • The provisioned write capacity of the index
  • Write activity on the base table during index creation
  • Data distribution across index partitions

To streamline the creation process, increase the number of write capacity units on the index. Global secondary indexes inherit the read or write capacity mode from the base table. If your table is in on-demand mode, then DynamoDB also creates the index in on-demand mode. In this case, you can't increase the capacity on the index because an on-demand DynamoDB table scales itself based on incoming traffic.

Resolution

To monitor the index creation progress, use the OnlineIndexPercentageProgress Amazon CloudWatch metric:

  1. Open the DynamoDB console.
  2. In the navigation pane, choose Tables, and then select your table.
  3. Choose the Metrics tab.
  4. Choose View all CloudWatch metrics.
  5. In the search box, enter OnlineIndexPercentageProgress.
    Note: If the search returns no results, wait a minute for metrics to populate. Then, try again.
  6. Choose the name of the index to see the progress.

Determine the number of additional write capacity units that you need to backfill your data.

First, use the following formula to calculate the adjusted average item size:

adjusted_average_item_size = max(data_size_in_KB/item_count, 1KB)

Then, use the following formula to calculate the additional number of write capacity units:

write capacity units = item_count * adjusted_average_item_size / desired_time_in_seconds

In this formula, item_count is the total number of items in the table and desired_time_in_seconds is the time frame for your data to backfill.

See the following examples of these calculations:

Example 1

You have a 1 GiB (1,074,000 KB) table with 100,000 items. You want the backfilling process to complete in 10 minutes (600 seconds). Calculate the number of write capacity units as follows:

adjusted_average_item_size = max (1,074,000/100,000, 1) = 10.74

write capacity units = 10.74 * 100,000 / 600 = 1,790

Example 2

You have a 1GiB (1,074,000 KB) table with 10,000,000 items. You want the backfilling process to complete in 10 minutes (600 seconds). Calculate the number of write capacity units as follows:

adjusted_average_item_size = max (1,074,000/10,000,000, 1) = 1

write capacity units = 1 * 10,000,000 / 600 = 16,667

Example 3

You have a 2 GiB table with 100,000 items. You want the index creation to complete in 1 hour. Calculate the number of write capacity units as follows:

adjusted_average_item_size = max (2*1,074,000/100,000, 1) = 21.48

write capacity units = 21.48 * 100,000 / 3600 = ~597

The required number of write capacity units depends on the table and index size, number of items, and the backfilling time frame that you select.

To provision additional write capacity, complete the following steps:

  1. Open the DynamoDB console.
  2. In the navigation pane, choose Tables, and then select your table.
  3. Choose the Capacity tab.
  4. Increase the write capacity of the index, and then choose Save.
  5. To see if the creation speed is improved, check the OnlineIndexPercentageProgress metric after a minute.

Note: You don't need to provision additional read capacity.

Best practices

Review the following best practices:

  • Before you add or delete indexes on a table, wait for the global secondary index to exit the backfilling state. Otherwise, you get an error similar to the following:
    "Subscriber limit exceeded: Only 1 online index can be created or deleted simultaneously per table"
  • To streamline the process, provision the write capacity units instead of using autoscaling. You can have autoscaling while you create a global secondary index. However, autoscaling doesn't work until after your index is active.

Note: DynamoDB doesn't consume read units from the base table when data is projected into the global secondary index.

Related information

Improving data access with secondary indexes

Managing global secondary indexes

Secondary indexes

Modifying a Global Secondary Index during creation

How do I choose the right primary key for my Amazon DynamoDB table?

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago
2 Comments

This formula is accurate when the item size is 1kb or more, this way a number of kilobytes represents the total writes we need to make. When the item size is less than 1kb, we need more write operations because an item of 100B still requires 1WCU. So 1 GiB (1,074,000 KB) where each item is 1KB gives the highest throughput and the formula is accurate, but when an item is 100B we need 10 more write operations to write 1 GiB of data.

The formula could look like this.

adjusted_average_size = max(data_size_KB/items_count, 1KB) 
WCU = items_count * adjusted_average_size / time_SECONDS

1 GiB (1,074,000 KB) with 100,000 items in 10 minutes will give us the same 1,790 WCUs.

1 GiB (1,074,000 KB) with 10,000,000 items in 10 minutes will give us 16,667 WCUs.

Please let me know what you think about this estimation. I think items less than 1 KB are quite a regular use case for DynamoDb and the estimations should keep them in mind.

replied 5 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 5 months ago