Questions about Concurrency Scaling

0
  1. First, a question where I'm looking for clarification.

The docs, here;

https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html

Sayeth;

The following types of queries are candidates for Concurrency Scaling:
Read-only queries, Data manipulation language (DML) queries are not supported.
Queries that don't reference tables that use an interleaved sort key.
The query doesn't use Amazon Redshift Spectrum to reference external tables.
The query must encounter queueing to be routed to a Concurrency Scaling cluster.

The actual meaning of the English here - "the following types of queries are candidates for Concurrency Scaling" - is that each bullet point is a class of queries which is able to be run on CS. So I can run a query which does use interleaved so long as it does not use Spectrum.

I suspect that what is meant is that ALL of the restrictions listed must be honoured for a query to be considered.

Is this correct?

  1. What's happening for data access? data is stored on local disk on Redshift worker nodes. Here we have an external cluster - presumably which holds no data of its own - running queries. Is it accessing the disk on the original cluster, over the network?
Toebs2
asked 5 years ago385 views
3 Answers
0

Hi Toebs2,

Since I previewed the feature I can answer at least one of your questions.

#1 Yes, all 4 conditions for a query listed in the doc and in you question must be true (think AND) for a query to be a considered for execution on a Concurrency Scaling cluster. There's also sort of another condition that's also mentioned in the doc but not it the bulleted list. That is the query can't be "small", meaning that some very small queries that pass the other 4 condition may still execute on the main cluster because the overhead of executing them on a Concurrency Scaling cluster out weights the query cost. I don't know for sure but I think that "small" may somehow be coupled to the current dynamic SQA cost threshold. So, for your example I do not think your query that references a local storage table with an interleaved sort key, but does not references an external table stored in S3, can be a candidate for execution on a Concurrency Scaling cluster.

#2 I can't answer your second question because Redshift haven't yet shared much publicly about how the feature works internally, although that may be happening soon at some public event.

I hope this helps,
-Kurt

klarson
answered 5 years ago
0

Please review the links I posted in the "New Feature: Concurrency Scaling – Peak performance for bursts of activity" https://forums.aws.amazon.com/thread.jspa?threadID=300902&tstart=0

Hopefully they address the primary questions you had but please let me know if not.

answered 5 years ago
0

Regarding question #2, Concurrency Scaling clusters retrieve Redshift data blocks from S3 as needed. They do not retrieve data blocks from the main cluster.

answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions