Secondary Index in DynamoDB

0

Hi,

I have a use case where I am fetching data on certain items (unique itemID) multiple times a day (identified by day_BatchTime) and storing them on a dyanmoDB. My composite primary key consists of itemID & day_BatchTime. I have set itemID as parition key and day_BatchTime as sort key.

But I need to report data for each day on a daily basis. So I tried setting up a global secondary index as feedDate. But running the query on global secondary index did not return any result (screenshot: https://ibb.co/kJ3p5mZ ). Please let me know if I am missing something and if there is a better way to achieve my objective of day wise items.

I am using AWS console to fetch this data for now.

Regards, Dbeings

dbeing
asked 3 years ago870 views
3 Answers
0

Hi,

have you verified that feedDate has the same data type in both the table and the Secondary Index?

could you please share the table and index details?

thank you

AWS
EXPERT
answered 2 years ago
  • Hi Fabrizio,

    The feedDate query is now working in AWS console but cant get it working through Python code in Lambda. I followed the docs and below is the snippet. My dataframe is not getting populated. Without the filter expression, it gets populated so I think there is something off in filter expression. It would be great if you could provide some pointers.

    done = False start_key = None

    scan_kwargs = {'FilterExpression': Key('feedDate').eq('15-03-2022'), 'ProjectionExpression':'feedDate, itemID, val' }

    while not done:
    if start_key: scan_kwargs['ExclusiveStartKey'] = start_key response = table.scan(**scan_kwargs) items = response.get('Items', []) page_no = page_no+1 item_i = 0 if len(items) == 0: break while item_i<len(items): for key in items[item_i]: #mystr = mystr + ":" + key + str(items[item_i][key]) df_list.loc[row_num,key] = str(items[item_i][key]) item_i = item_i+ 1 row_num = row_num+1 start_key = response.get('LastEvaluatedKey', None) done = start_key is None

    dfLen = len(df_list)
    

    Regards, Dbeings

0

Hi Fabrizio,

The feedDate is of type String in both cases. Now I am able to run scan and get the results but I am still not able to run a query. The query requires a partition key and my partition key are different item ids for each day.

Below are key elements of table and details on global secondary index: Partition key: itemID (String) Sort key: Batch (String) feedDate(String)

Global secondary index details Name feedDate-index Status  Active Partition key feedDate (String) Sort key - Read capacity Range: 1 - 10; Auto scaling at 70%; Current provisioned units: 1 Write capacity Range: 1 - 10; Auto scaling at 70%; Current provisioned units: 1 Size 2.4 megabytes Item count 8527

Regards, Dbeings

dbeing
answered 2 years ago
0

DynamoDB is designed for quick LOOKUP using specific keys. It is not designed to be an analytical tool, even if you have options such as SCAN and QUERY. If you know that you need to provide aggregations (or any other type of analytical functions), it is better to either create aggregation tables in DDB or export the data through DynamoDB streams to an analytical tool such as Athena.

For example, every time that you put another record in your main table above, you can also increase a counter to a day-count table. See an example for such logic here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.03.html#GettingStarted.Python.03.04

You don't need to limit your LOOKUP to a single record, as you have the sort key to get a few records. However, if you see that you need to read hundreds or even dozens of records to generate a specific statistic such as count, max, mean, etc. you are using DynamoDB inefficiently.

MLGuy
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions