Do dimensions reduce the amount of data that's scanned?

0

I am reading the Timestream Concepts document and want to make sure I understand something.

Enter image description here

I understand what a Series is, and I understand that a Series is defined by multiple dimensions. I'm trying to understand how this relates to query costs. When you have multiple dimensions, does it cost less "$ per GB Scanned" when you specify all your dimensions, because you're only scanning the 'Series' described by that set of dimensions?

My current design is to store a bunch of measurements per row, with the "series" defined by dimensions like "Customer" "Site" "System" "Device." When I make queries, they'll be at some point in that hierarchy - I might want to get the latest record for one device, or I might want to get the latest record for every device (group by device) within a particular system or even site. I'm not sure how this affects query cost, or for that matter partitioning.

Alternatively I thought about defining a primary key to use as the partition key. This would be a composite of the other dimensions kind of like how you usually form a DynamoDB primary key, e.g. 'customer#site#system#device' and it seems like that would help partitioning, but would end up costing me a lot if I needed to do a 'site' level query.

I guess the place to start is me understanding the relationship between dimension (especially multiple dimensions) vs. GB scanned. Can somebody help clarify this?

profile picture
wz2b
asked 9 months ago338 views
1 Answer
0

In Amazon Timestream, a time series database service, dimensions are indeed used to define a series. A series is a unique combination of a measure, time, and dimensions. The dimensions can be thought of as metadata that provides additional context for the measure.

When it comes to query costs, the amount of data scanned is a key factor. If your query is designed in such a way that it only needs to scan a small subset of your data, then the cost will be less. This is where dimensions can play a role. If you specify dimensions in your query, Timestream can potentially reduce the amount of data that needs to be scanned, which can in turn reduce the cost.

However, the exact impact on cost will depend on the specifics of your data and your query. For example, if your dimensions are very high-cardinality (i.e., they have a large number of unique values), then specifying them in your query might not reduce the amount of data scanned by much. On the other hand, if your dimensions are low-cardinality, then specifying them in your query could significantly reduce the amount of data scanned.

As for partitioning, Timestream automatically partitions data based on time. You don't need to manually define a partition key like you would in DynamoDB. However, the way you structure your data and design your queries can still impact performance. For example, if you often query for data within a specific time range, it would be beneficial to structure your data in such a way that all the data for that time range is in the same partition.

In conclusion, while dimensions can potentially reduce the amount of data scanned and thus the cost, the exact impact will depend on the specifics of your data and your queries. It's also worth noting that Timestream's pricing model includes not just the cost per GB scanned, but also costs for data ingestion, data storage, and data transfer. Therefore, it's important to consider all these factors when designing your data model and queries.

profile picture
answered 9 months ago
  • What you are describing makes it sound like we don't pay for the fetch from the index, we only pay for when timestream needs to reach into the actual tiles of data, is that accurate?

    If I don't specify a partitioning key of my own, then the data will only be partitioned by time (the default) ?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions