How do I correlate the query plan with the query report in Amazon Redshift?

8 minute read

I want to correlate the query plan with the query report in my Amazon Redshift cluster.

Short description

To determine the usage that's required to run a query in Amazon Redshift, run the EXPLAIN command. The execution plan that's returned from the EXPLAIN command outlines the query planning and execution steps that are involved. Then, use the SVL_QUERY_REPORT system view to view query information at a cluster slice level. You can use the slice-level information to detect uneven data distribution across the cluster that can affect query performance.

Amazon Redshift processes the query plan and translates the plan into steps, segments, and streams. For more information, see Query planning and execution workflow.


Create a table and get the execution plan and SVL query report for the query

To create a table and get the execution plan and SVL query report, complete the following steps:

  1. Create two tables with different sort keys and distribution keys.

  2. Run the following query where a join operation wasn't performed on a distribution key:

    select eventname, sum (pricepaid) from sales, event where sales.eventid = event.eventid group by eventname order by 2 desc;

    This query distributes the inner table to all compute nodes.

  3. Retrieve the query plan:

    EXPLAIN <query>;
                                                   QUERY PLAN                                               
    XN Merge  (cost=1002815368414.24..1002815368415.67 rows=571 width=27)
       Merge Key: sum(sales.pricepaid)
       ->  XN Network  (cost=1002815368414.24..1002815368415.67 rows=571 width=27)
             Send to leader
             ->  XN Sort  (cost=1002815368414.24..1002815368415.67 rows=571 width=27)
                   Sort Key: sum(sales.pricepaid)
                   ->  XN HashAggregate  (cost=2815368386.67..2815368388.10 rows=571 width=27)
                         ->  XN Hash Join DS_BCAST_INNER  (cost=109.98..2815367496.05 rows=178125 width=27)
                               Hash Cond: ("outer".eventid = "inner".eventid)
                               ->  XN Seq Scan on sales  (cost=0.00..1724.56 rows=172456 width=14)
                               ->  XN Hash  (cost=87.98..87.98 rows=8798 width=21)
                                     ->  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=21)
    (12 rows)
  4. Run the SVL_QUERY_REPORT query to get the query report:

    select * from svl_query_report where query = query_id order by segment, step, elapsed_time, rows;

    Note: Replace query_id with your query's ID.

Map the query plan with the query report

To map the query plan with the query report, complete the following steps:

  1. Run the following query to get the svl_query_report for a query with a segment value of 0:
    select query,slice,segment,step,start_time,end_time,elapsed_time,rows,bytes,label from svl_query_report where query = 938787 and segment = 0 order by segment, step, elapsed_time, rows;
    EXPLAIN <query>;
    ->  XN Hash  (cost=87.98..87.98 rows=8798 width=21)
       ->  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=21)
    The following is an example output:
    query  | slice | segment | step |         start_time         |         end_time          | elapsed_time | rows | bytes  |            label              
    938787 |     0 |       0 |    0 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4383 | 128626 | scan   tbl=278788 name=event
    938787 |     1 |       0 |    0 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4415 | 128918 | scan   tbl=278788 name=event
    938787 |     0 |       0 |    1 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4383 |      0 | project                     
    938787 |     1 |       0 |    1 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4415 |      0 | project                     
    938787 |     0 |       0 |    2 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4383 | 126660 | bcast                      
    (6 rows)
    In the preceding output, when the segment value is 0, Amazon Redshift performs a sequential scan operation to scan the event table. You can find the sequential scan operation in the label column.
  2. Run the following query to get the svl_query_report for a query with a segment value of 1:
    select query,slice,segment,step,start_time,end_time,elapsed_time,rows,bytes,label from svl_query_report where query = 938787 and segment = 1 order by segment, step, elapsed_time, rows;
    The following is an example output:
    query  | slice | segment | step |       start_time          |          end_time          | elapsed_time | rows | bytes  |     label           
    938787 |     1 |       1 |    0 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.830037 |         3173 |    0 |      0 | scan   tbl=376297 name=Internal Worktable  
    938787 |     0 |       1 |    0 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.831142 |         4278 | 8798 | 253580 | scan   tbl=376297 name=Internal Worktable 
    938787 |     1 |       1 |    1 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.830037 |         3173 |    0 |      0 | project                                   
    938787 |     0 |       1 |    1 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.831142 |         4278 | 8798 |      0 | project                                   
    938787 |     1 |       1 |    2 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.830037 |         3173 |    0 |      0 | hash   tbl=439                            
    (6 rows)
    The query continues to run until the segment value is 1. A hash table operation is performed on the inner table in the join.
  3. Run the following query to get the svl_query_report for a query with a segment value of 2:
    select query,slice,segment,step,start_time,end_time,elapsed_time,rows,bytes,label from svl_query_report where query = 938787 and segment = 2 order by segment, step, elapsed_time, rows;
    EXPLAIN <query>;
    ->  XN Hash Join DS_BCAST_INNER  (cost=109.98..2815367496.05 rows=178125 width=27)
                               Hash Cond: ("outer".eventid = "inner".eventid)
       ->  XN Seq Scan on sales  (cost=0.00..1724.56 rows=172456 width=14)
    The following is an example output:
    query  | slice | segment | step |         start_time         |          end_time          | elapsed_time | rows  |  bytes  |            label             
    938787 |     1 |       2 |    0 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 | 1730380 | scan   tbl=278792 name=sales  
    938787 |     0 |       2 |    0 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 | 1718740 | scan   tbl=278792 name=sales  
    938787 |     1 |       2 |    1 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | project                       
    938787 |     0 |       2 |    1 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | project                       
    938787 |     1 |       2 |    2 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | project                       
    938787 |     0 |       2 |    2 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | project                       
    938787 |     1 |       2 |    3 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | hjoin  tbl=439                
    938787 |     0 |       2 |    3 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | hjoin  tbl=439                
    938787 |     1 |       2 |    4 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | project                       
    938787 |     0 |       2 |    4 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | project                       
    938787 |     1 |       2 |    5 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | project                       
    938787 |     0 |       2 |    5 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | project                       
    938787 |     1 |       2 |    6 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 |   576 |   34916 | aggr   tbl=448                
    938787 |     0 |       2 |    6 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 |   576 |   34916 | aggr   tbl=448                
    (16 rows) 
    In the preceding example, the query is run when the segment value is 2, and performs a sequential scan operation to scan the sales table. In the same segment, an aggregate operation is performed to aggregate results, and then a hash join operation is performed to join tables. The join columns for one of the tables isn't a distribution key or a sort key. As a result, the inner table is distributed to all the compute nodes as DS_BCAST_INNER. You can then see the inner table in the execution plan. You can also run this query to get the SVL_QUERY_REPORT for a query with a segment value of 3, 4, and 5.

In these segments, a hash aggregate operation and sort operation are performed and identified from the labels "aggr" and "sort". The hash aggregate operation is performed on unsorted grouped aggregate functions. The sort operation is performed to evaluate the ORDER BY clause.

After all the segments are used, the query runs a network operation on segments 4 and 5 to send intermediate results to the leader node. The results are sent to the leader node for additional processing. You can see the results with the "return" label.

After the query is completed, run the following query to check the execution time of the query in milliseconds:

select datediff (ms, exec_start_time, exec_end_time) from stl_wlm_query where query= 938787;

(1 row)

Optimize your query

When you analyze your query plan, you can tune your query performance based on your use case. For more information, see Top 10 performance tuning techniques for Amazon Redshift.

Related information

Mapping the query plan to the query summary

Reviewing query plan steps

Using the SVL_QUERY_REPORT view

AWS OFFICIALUpdated 3 months ago
No comments