如何将查询计划与 Amazon Redshift 中的查询报告关联?
5 分钟阅读
0
我想将查询计划与 Amazon Redshift 集群中的查询报告关联。
简短描述
要确定在 Amazon Redshift 中运行查询所需的使用量,请运行 EXPLAIN 命令。EXPLAIN 命令返回的执行计划概括介绍了所涉及的查询计划和执行步骤。然后,使用 SVL_QUERY_REPORT 系统视图,查看集群切片级别的查询信息。您可以使用切片级信息,以检测集群中可能影响查询性能的不均匀数据分配。
Amazon Redshift 处理查询计划,并将该计划转换为步骤、分段和流。有关更多信息,请参阅查询计划和执行工作流程。
解决方案
创建表并获取查询的执行计划和 SVL 查询报告
如需创建表并获取执行计划和 SVL 查询报告,请完成下面的步骤:
-
在未对分配键执行联接操作的情况下运行下列查询:
select eventname, sum (pricepaid) from sales, event where sales.eventid = event.eventid group by eventname order by 2 desc;
此查询会将内部表分配给所有计算节点。
-
检索查询计划:
EXPLAIN <query>; QUERY PLAN -------------------------------------------------------------------------------------------------------- XN Merge (cost=1002815368414.24..1002815368415.67 rows=571 width=27) Merge Key: sum(sales.pricepaid) -> XN Network (cost=1002815368414.24..1002815368415.67 rows=571 width=27) Send to leader -> XN Sort (cost=1002815368414.24..1002815368415.67 rows=571 width=27) Sort Key: sum(sales.pricepaid) -> XN HashAggregate (cost=2815368386.67..2815368388.10 rows=571 width=27) -> XN Hash Join DS_BCAST_INNER (cost=109.98..2815367496.05 rows=178125 width=27) Hash Cond: ("outer".eventid = "inner".eventid) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=14) -> XN Hash (cost=87.98..87.98 rows=8798 width=21) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=21) (12 rows)
-
运行 SVL_QUERY_REPORT 查询,以获取查询报告:
select * from svl_query_report where query = query_id order by segment, step, elapsed_time, rows;
**注意:**请将 query_id 替换为您的查询的 ID。
将查询计划与查询报告映射
若要将查询计划与查询报告映射,请完成下面的步骤:
- 运行下面的查询,获取分段值为 0 的查询的 svl_query_report:
下面是输出示例:select query,slice,segment,step,start_time,end_time,elapsed_time,rows,bytes,label from svl_query_report where query = 938787 and segment = 0 order by segment, step, elapsed_time, rows; EXPLAIN <query>; -> XN Hash (cost=87.98..87.98 rows=8798 width=21) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=21)
在前面的输出中,当分段值为 0 时,Amazon Redshift 会执行顺序扫描操作来扫描事件表。可以在标签列中找到顺序扫描操作。query | slice | segment | step | start_time | end_time | elapsed_time | rows | bytes | label -------+-------+---------+------+----------------------------+---------------------------+--------------+------+--------+------------------------------ 938787 | 0 | 0 | 0 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 | 1561 | 4383 | 128626 | scan tbl=278788 name=event 938787 | 1 | 0 | 0 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 | 1561 | 4415 | 128918 | scan tbl=278788 name=event 938787 | 0 | 0 | 1 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 | 1561 | 4383 | 0 | project 938787 | 1 | 0 | 1 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 | 1561 | 4415 | 0 | project 938787 | 0 | 0 | 2 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 | 1561 | 4383 | 126660 | bcast ... (6 rows)
- 运行下面的查询,获取分段值为 1 的查询的 svl_query_report:
下面是输出示例:select query,slice,segment,step,start_time,end_time,elapsed_time,rows,bytes,label from svl_query_report where query = 938787 and segment = 1 order by segment, step, elapsed_time, rows;
查询继续运行,直到分段值为 1。对联接中的内部表执行哈希表操作。query | slice | segment | step | start_time | end_time | elapsed_time | rows | bytes | label -------+-------+---------+------+---------------------------+----------------------------+--------------+------+--------+------------------------------------------- 938787 | 1 | 1 | 0 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.830037 | 3173 | 0 | 0 | scan tbl=376297 name=Internal Worktable 938787 | 0 | 1 | 0 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.831142 | 4278 | 8798 | 253580 | scan tbl=376297 name=Internal Worktable 938787 | 1 | 1 | 1 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.830037 | 3173 | 0 | 0 | project 938787 | 0 | 1 | 1 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.831142 | 4278 | 8798 | 0 | project 938787 | 1 | 1 | 2 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.830037 | 3173 | 0 | 0 | hash tbl=439 ... (6 rows)
- 运行下面的查询,获取分段值为 2 的查询的 svl_query_report:
下面是输出示例:select query,slice,segment,step,start_time,end_time,elapsed_time,rows,bytes,label from svl_query_report where query = 938787 and segment = 2 order by segment, step, elapsed_time, rows; EXPLAIN <query>; -> XN Hash Join DS_BCAST_INNER (cost=109.98..2815367496.05 rows=178125 width=27) Hash Cond: ("outer".eventid = "inner".eventid) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=14)
在前面的示例中,当分段值为 2 时运行查询,并执行顺序扫描操作来扫描 sales 表。在同一分段中,执行聚合操作来聚合结果,然后执行哈希联接操作来联接表。其中一个表的联接列不是分配键或者排序键。结果是,内部表以 DS_BCAST_INNER 形式分配到所有计算节点。之后,可以在执行计划中看到内部表。您也可以运行此查询,以获取分段值为 3、4 和 5 的查询的 SVL_QUERY_REPORT。query | slice | segment | step | start_time | end_time | elapsed_time | rows | bytes | label --------+-------+---------+------+----------------------------+----------------------------+--------------+-------+---------+------------------------------ 938787 | 1 | 2 | 0 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 | 26560 | 86519 | 1730380 | scan tbl=278792 name=sales 938787 | 0 | 2 | 0 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 | 27486 | 85937 | 1718740 | scan tbl=278792 name=sales 938787 | 1 | 2 | 1 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 | 26560 | 86519 | 0 | project 938787 | 0 | 2 | 1 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 | 27486 | 85937 | 0 | project 938787 | 1 | 2 | 2 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 | 26560 | 86519 | 0 | project 938787 | 0 | 2 | 2 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 | 27486 | 85937 | 0 | project 938787 | 1 | 2 | 3 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 | 26560 | 86519 | 0 | hjoin tbl=439 938787 | 0 | 2 | 3 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 | 27486 | 85937 | 0 | hjoin tbl=439 938787 | 1 | 2 | 4 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 | 26560 | 86519 | 0 | project 938787 | 0 | 2 | 4 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 | 27486 | 85937 | 0 | project 938787 | 1 | 2 | 5 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 | 26560 | 86519 | 0 | project 938787 | 0 | 2 | 5 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 | 27486 | 85937 | 0 | project 938787 | 1 | 2 | 6 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 | 26560 | 576 | 34916 | aggr tbl=448 938787 | 0 | 2 | 6 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 | 27486 | 576 | 34916 | aggr tbl=448 ... (16 rows)
在这些分段中,执行哈希聚合操作和排序操作,并根据标签“aggr”和“sort”进行识别。哈希聚合操作是对未排序的分组聚合函数执行的。执行排序操作,以评估 ORDER BY 子句。
在使用完所有分段之后,查询会对分段 4 和 5 运行网络操作,以将中间结果发送至领导节点。结果将发送至领导节点进行进一步处理。您可以使用“return”标签查看结果。
查询完成后,运行以下查询,以检查查询的执行时间(以毫秒为单位):
select datediff (ms, exec_start_time, exec_end_time) from stl_wlm_query where query= 938787; date_diff ----------- 101 (1 row)
优化查询
在分析查询计划时,可以根据自己的用例调整查询性能。有关更多信息,请参阅 Top 10 performance tuning techniques for Amazon Redshift。
相关信息
AWS 官方已更新 9 个月前
没有评论
相关内容
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前