- Newest
- Most votes
- Most comments
I would recommend you watch Rick Houlihan's Dynamodb office hours youtube videos. Rick models real use cases in each video and he explains each pattern he uses and why you should use them.
When it comes to NoSQL databases you shouldn't think how data is organized but how you will access that data. Plus prioritize those patterns so you can optimize the patterns that are more commonly used. I would recommend you list all your access patterns, like:
- fetching an episode by episode number.
- fetching all episodes that occurred in given time range.
- fetching all episodes which include a given keyword list (this is a tricky one in dynamodb)
Another key thing to take in mind is how partition key is built. You want your partition keys to be as distributed as possible so dynamodb can scale in easily. If you just have one single radio show (with many episodes). It looks to me a good PK here would be the episodeNumber, although that ties you up to have one single radio show.
Since an episode may be broadcasted more than once, I would include a SK based on broadcastedAt (this gives you a bonus pattern, iterate over the different broadcast for a given episode number). Something like:
|pk|sk|attributes| |---------| |<episodeNumber>|Metadata|<episode details>| |<episodeNumber>|<broadcastedAt>|<you could duplicate episode details here depending on how reads/writes happen>|
That will cover your first pattern + the bonus pattern of accessing different broadcasts of the same episode by date.
The second pattern: fetching all episodes that occurred in given time range, will depend on how you will query that range, is it by day? other granularity? I would add a GSI which PK is a day, then within that partition you will have all episodes that occurred that day (if you need query more than one day, then you would need to run parallel queries though).
|pk|sk|gsi1pk|gsi1sk|attributes| |---------| |<episodeNumber>|Metadata|||<episode details>| |<episodeNumber>|<broadcastedAt>|<broadcastedAtDay>|<episodeNumber>|<you could duplicate episode details here depending on how reads/writes happen>|
The third pattern is quite tricky as you don't know in advance how many keywords you have. If your app is a write-once-read-many application, then I would duplicate episode entries in different partitions based on those keywords, so you have data duplicated but optimized for reading. To do so, there are a few things your app must take in mind:
- writing an episode will be a mix of write/delete items in the database.
- you must sort keywords at for storage purposes.
|pk|sk|gsi1pk|gsi1sk|attributes| |---------| |<episodeNumber>|Metadata|||<episode details>| |<episodeNumber>|<broadcastedAt>|<broadcastedAtDay>|<episodeNumber>|<you could duplicate episode details here depending on how reads/writes happen>| |<keyword1>|<broadcastedAt>|||<you could duplicate episode details here depending on how reads/writes happen>| |<keyword2>|<broadcastedAt>#<episodeNumber>|||<you could duplicate episode details here depending on how reads/writes happen>| |<keyword1>#<keyword2>|<broadcastedAt>#<episodeNumber>|||<you could duplicate episode details here depending on how reads/writes happen>|
I would really recommend The DynamoDB book from Alex Debrie.
There is also the cheatsheet with summary of best practices and patterns.
Consider DynamoDB, explained - A Primer on the DynamoDB NoSQL database. The authors blog also has a number of articles on DynamoDB.
Agreed, Rick Houlihan's the man to follow when learning about DynamoDB.
Plenty of AWS tech talks/re:invent content on YouTube, he also makes regular appearances on the "Amazon DynamoDB | Office Hours" thread on the AWS Twitch channel.
Hi,
You could also start with a single database document structure:
{
"EpisodeId": {
"S": "EP01"
},
"Title": {
"S": "Title"
},
"Guests": {
"SS": [
"Jacco",
"John"
]
},
"Keywords": {
"SS": [
"aws"
]
},
"AiringDates": {
"SS": [
"2021-12-12",
"2021-12-19"
]
}
}
EpisodeId would be the partition key.
All necessary query operations can be easily performed using a Scan
. You will always get the full details of the episode in one operation.
The API to access the data should be of more concern:
createEpisode episodeId, airDates, guests, keywords deleteEpisode episodeId getEpisode episodeId addAirDate episodeId, airDate removeAirDate episodeId, airDate addGuest episodeId, guest removeGuest episodeId, guest addKeyword episodeId, keyword removeKeyword episodeId, keyword getEpisodesByAirDate airdate getEpisodesByKeyword keyword getEpisodesByGuest guest
If you database grows and feel it is not performing any more or that you pay too much for the scans you can switch to using a more complicated database design. The API can stay the same.
One probable improvement you might consider doing right away is using a separate table for the guests. And store their IDs in the episode table instead of the names. The API could use BatchGetItem if you want to return the details of the guests when getting episodes (potentially caching the guests).
Going for a more complicated single-table database design is actually for access optimization which in this case might be immature.
Regards, Jacco
Relevant content
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 3 years ago
Thanks, reading through that -- it's answered some of my questions, so far. (This shouldn't be hard, I'm just stumbling over the shift in mindset.) And thanks for your patience!