In DynamoDB, what's partition size limitation of a table without LSI

0

When I read DynamoDB developer guide, in page 572, there is a paragraph said "The maximum size of any item collection for a table which has one or more local secondary indexes is 10 GB. This does not apply to item collections in tables without local secondary indexes, and also does not apply to item collections in global secondary indexes. Only tables that have one or more local secondary indexes are affected.

My questions are:

  1. without LSI, is there any size limitation for each partition?
  2. when query with the partition key + sort key query (e.g. primary hit) for small size of dataset retrieval, how much impact on big volume partition data? for example: if a partition has 1TB size, would it be much slower than a partiion having 10GB size?

In more speicific scenario: My GSI key are content_type as partition key, end_time as sort key, so customer can search documents based on content type and end_time range. For each content type, I might have 0.5 billion entry per day. when build GSI, it would be about 0.2KB * 0.5 billion = 100GB size per day and 36TB per year. customer wants to keep data at least a year.

If I don't split the table to multiple tables in time series way, would be dynamodb good enough to hold such a 36TB as one item collection in one table and provide a fast access for partition key + small range sort key query?

asked a year ago2235 views
2 Answers
4
Accepted Answer
  1. Without LSI, is there any size limitation for each partition?
  • A physical DynamoDB partition still has a maximum size of 10GB but an item-collection (items which share the same partition key) can span multiple partitions which means there is no size limit. When you use an LSI, it prevents an item-collection from spanning multiple partitions as it needs to ensure the item-collection remains available for strongly consistent reading, which would be impossible while also maintaining performance if it spanned multiple partitions.
  1. When do the partition key + sort key query (e.g. primary hit), how much impact on big volume partition data? for example: if a partition has 1TB size, would it be much slower than a partiion having 10GB size?
  • While there is not limit on how much data you can store in an item-collection, you wouldn't want to be reading GB's or TB's of data from DynamoDB. DynamoDB implements a 1MB page size limit, meaning a single request can only return up to 1MB, and to obtain more you would have to paginate.
  • Ideally when you have such large amounts of data pertaining to a single item-collection you would rely on the sort-key to efficiently query the data. Imagine a scenario where an IOT device updates its status every second, over the course of a year you would have 30M items. Its very unlikely your use case wants to access the data which is a year old, you more than likely want to know what the IOT device done in the last min/hour/day, which means you can use the sort-key to refine 30M items to 60/3600/86400 items, which is much more efficient and performant.
  1. My GSI key are content_type as partition key, end_time as sort key, so customer can search documents based on content type and end_time range. For each content type, I might have 0.5 billion entry per day. when build GSI, it would be about 0.2KB * 0.5 billion = 100GB size per day and 36TB per year. customer wants to keep data at least a year. If I don't split the table to multiple tables in time series way, would be dynamodb good enough to hold such a 36TB as one item collection in one table and provide a fast access for partition key + small range sort key query?
  • Yes, DynamoDB has no problem storing TB's of data for a single partition key, performance will remain unchanged so long as your condition on the sort key is sufficient to narrow down the search. But keep in mind, even if you search on a single day, you are trying to fetch 100GB of data which could be a slow request.
profile pictureAWS
EXPERT
answered a year ago
profile picture
EXPERT
reviewed 5 months ago
  • Thanks, Leeroy. I update my question in more detail, could you please help review it?

  • Updated my answer

  • When I query with content_type + end_time(last 30 days), although the hitted dataset is quite large(3TB size), I suppose it should be no problem if I did the paging for query, since user generally browse a couple of first pages in the WEB front-end. Is this correct?

  • Yes, in that case you would Limit each page. If your front end shows 50 items, then use Limit=50 and when the use clicks next, use the LastEvaluatedKey from the previous response to obtain the next 50 items in the results.

  • Thanks for your valuable feedback

1
  1. Yes the partition size is still 10GB without LSIs, but an Itemcollection isn't restricted to 10GB in that case - it can span multiple partitions.

  2. Returning a big volume of data will be much slower than a small amount of course, but if you mean whether querying the same amount of data from a big Itemcollection is much slower than from a small Itemcollection then no, not substantially.

EXPERT
answered a year ago
profile picture
EXPERT
reviewed a year ago
  • Hi Skinsman, I update my question in more detail, could you please help review it?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions