I have a table that stores 1M 384-dimensional vectors in the column called vector
of type "char"[]
. Consider the query:
EXPLAIN ANALYZE
WITH Vars(key) as (
VALUES (array_fill(1, ARRAY[384])::vector)
)
SELECT content_id
FROM MyTable, Vars
ORDER BY vector::int[]::vector <#> key
LIMIT 10;
The key
is just a toy vector consisting of all ones. <#>
is the dot product operator of the pgvector
extension, and vector
is the type defined by that extension, which to my understanding is similar to real[]
.
Here is the output when running on db.m7g.large
RDS PostgreSQL instance:
Limit (cost=83268.20..83269.37 rows=10 width=12) (actual time=3070.513..3070.570 rows=10 loops=1)
-> Gather Merge (cost=83268.20..180496.59 rows=833328 width=12) (actual time=3070.512..3070.567 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=82268.18..83309.84 rows=416664 width=12) (actual time=3062.863..3062.864 rows=8 loops=3)
Sort Key: ((((mytable.vector)::integer[])::vector <#> '[1,1, ... ,1]'::vector))
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Seq Scan on mytable (cost=0.00..73264.22 rows=416664 width=12) (actual time=0.017..3005.128 rows=333333 loops=3)
Planning Time: 0.097 ms
Execution Time: 3070.597 ms
(12 rows)
Given this output, how can I make an educated guess about which of the many instance types will give the best performance to cost ratio for this particular query?