Efficiently Paginating Results with Pipeline Resolves

0

I have a pipeline resolver that:

  1. Queries a database to get all the places a user has visited.
  2. Queries a 2nd database to get all the nearby places.
  3. Filters and returns only the nearby places the user has not visited.

I would like to paginate these results, although I am concerned with efficiency.

Is it possible to cache part 1, so that it does not need to be recomputed every time a user requests the next section of data? What is the best way to do this?

Currently, if part 2 is limited to 10 elements, they could all be filtered out in part 3, but the next query could find that only 6 elements are filtered.

Is it possible for the query to return lists of length equal to that of the limit if there exists enough matching elements?

Zates
asked 5 years ago166 views
2 Answers
0
Accepted Answer

I think this will be a difficult access pattern to implement with DynamoDB in the manner you are describing. Here are a few options that may or may not work for your use.

  1. Break up the resolver and filter on the client.

You could add a field isVisited: Boolean to the location type and attach a resolver the returns true if the current user has visited that location and false otherwise. The Location.isVisited resolver could do an adjacency list lookup in DynamoDB to see if the current user has visited that location. From the client, you could then query nearby locations and handle the filtering on the client wherever the isVisited field is false.

  1. Use a lambda function.

You can implement your own pagination mechnism that iterates until as many items as the request asked for have been found. You could host this logic on lambda and do the more complicated lookup there.

  1. Use an async process

It may be worth thinking about how you can turn this into an async process that would allow you to efficently loop through the items that you want without worrying about joining different sets at query time.

For example, you could have a mutation field "getUnvisitedNearbyLocationFeed":

type Mutation {  
  getUnvisitedNearbyLocationFeed(location: String): ID  
}  

that calls a lambda function that does a few things:

  1. Create a unique id for the feed.
  2. Fetches all locations within the input geohash.
  3. It compares the nearby locations to all the currently logged in users visited locations.
  4. Put a record into the user/geohash specific feed for nearby items sorted by some attribute that you want to paginate on.

For example, you might start the process with this mutation:

mutation { getUnvisitedNearbyLocationFeed(location: "gbsuv") }  

which would generate a feed-id and return it after steps 2,3,4:

{ "data": { "getUnvisitedNearbyLocationFeed": "the-feed-id" } }  

In step 2, you would get all nearby locations and end up with some set:

(loc_id, geohash, name)

[(1, gbsuv..., "Place 1"), (2, gbsuv..., "Place 2"), (3, gbsuv..., "Place 3")]

In step 2, you would get all nearby locations that the currently logged in user has visited:

[(2, gbsuv..., "Place 2")]

In step 3, you subtract the second set from the first

[(1, gbsuv..., "Place 1"), (3, gbsuv..., "Place 3")]

you can also use this opportunity to calculate a score for the item that will determine the order for the item in the feed. This could be simple like a timestamp or something more complex. Once you have the items, put them into a table with the following key structure:

feed_id (HASH KEY) | score (SORT KEY) | loc_id | user_id | ...
(the-feed-id, 1000, 1, user1)
(the-feed-id, 1001, 3, user1)

On you client, after you create a nearby location feed you can store the feed_id and query this collection as many times as you want without having to recalculate which locations a user has not visited over and over again. You could also extend the example above to build in pagination so you can refresh and append to feeds as needed.

  1. A combination of the above or something totally different.

This is just one way of many ways that you could structure this but I hope it gives some ideas.

answered 5 years ago
0

Hey thank you for the great ideas! They have definitely opened my eyes to a bunch of different design paradigms. And possible limitations of dynamodb.

I like the third idea because it does not have to do any redundant operations, at the cost of some computational overhead.

Zates
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions