Can you use a Redshift Spectrum Table in a CTE?

0

Hi! I'm trying to make some data available via Spectrum to our reporting platform. I chose Spectrum because it offers lower latency to our data lake due to not needing a batched ETL process.

One of the queries I have looks like this

with txns as (select * from spectrum_table where ...)
select field1, field2, ...
from txns t1
left join txns t2 on t2.id = t1.id
left join txns t3 on t3.id = t1.id
where...

Intuitively, this should cache the Spectrum query in a temp table with the CTE, and make it available to query later in query without hitting S3 a second (or third) time.

However, I checked the explain plan, and with each join the number of "S3 Seq Scan"s goes up by one. So it appears to do the Spectrum scan each time the CTE is queried.

Questions:

  1. Is this actually happening? Or is the explain plan wrong?

  2. If it is happening, what other options are there to achieve this? Other than manually creating a temp table (this will be accessed by a reporting tool, so I'd prefer to avoid allowing explicit write access or requiring multiple statements)

Thanks!

已提问 2 年前585 查看次数
1 回答
0

Hi,

I think the doc here is a little misleading:

Each common table expression (CTE) defines a temporary table, which is similar to a view definition. You can reference these temporary tables in the FROM clause.

CTE are an efficient way of defining tables that can be used throughout the execution of a single query. In all cases, the same results can be achieved by using subqueries in the main body of the SELECT statement, but WITH clause subqueries may be simpler to write and read.

Where possible, WITH clause subqueries that are referenced multiple times are optimized as common sub-expressions; that is, it may be possible to evaluate a WITH subquery once and reuse its results.

And the optimization relies on the query optimizer task not the query execution itself.

You can alternatively use materialized views to cache frequently accessed data as described in the following blog:

And since it supports now external tables (https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-redshift-materialized-views-support-external-tables/).

Hope this helps

AWS
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则