Ao usar o AWS re:Post, você concorda com os AWS re:Post Termos de uso

Redshift Serverless Monitoring and Troubleshooting using System views

8 minuto de leitura
Nível de conteúdo: Intermediário
2

This article provides guidance on various system views and queries that can help you monitor and troubleshoot your queries on Redshift Serverless.

Introduction

This article introduces key system tables that facilitate monitoring active queries on your Amazon Redshift Serverless, checking query status, gathering execution metrics, troubleshooting queries, and implementing billing controls and best practices for Serverless. By leveraging these system tables, you can gain valuable insights into query execution behavior and cost management strategies for your Redshift Serverless environment.

Queries to use for monitoring on Redshift Serverless:

  1. Checking Current Running and Queued Queries:
SELECT * FROM sys_query_history WHERE status IN ('running','queued') ORDER BY start_time;

This query retrieves all currently running and queued queries from the sys_query_history view, ordered by their start time. It helps you identify long-running queries or queued queries that may be waiting for resources.

  1. Checking Current and Completed Queries:
SELECT * FROM sys_query_history ORDER BY start_time desc;

This query retrieves all queries, both current and completed, from the sys_query_history view, sorted in descending order by their start time. It provides a comprehensive view of all query activities, including their status, execution time, and other relevant details.

  1. Check query execution details of a query at step level to get all query execution metrics.
SELECT * FROM sys_query_detail WHERE query_id IN (<query_id>) ORDER BY query_id,child_query_sequence,stream_id,segment_id,step_id;

The SYS_QUERY_DETAIL view retrieves detailed information about a specific query or set of queries identified by their query_id. It provides insights into the query execution plan, including child queries, streams, segments, steps, tables and data scanned, skewed information.

Key metrics to watch out for are the table_name, step_name, duration for each step, blocks_read, input and output rows and bytes, data skewness, disk spill, and any alerts related to the steps in the query. This information can be valuable for troubleshooting and optimizing complex or slow-running queries.

  1. Identifying Locking and Blocking Sessions on Serverless:
SELECT a.txn_owner user_id, a.txn_db database_name, a.xid transaction_id, a.pid session_id, a.txn_start start_time, a.lock_mode, a.relation table_id, nvl(trim(d.relname), '') AS table_name, a.granted, b.pid AS blocking_session_id, datediff(s, a.txn_start, getdate()) / 86400 || ' days ' || datediff(s, a.txn_start, getdate()) % 86400 / 3600 || ' hrs ' || datediff(s, a.txn_start, getdate()) % 3600 / 60 || ' mins ' || datediff(s, a.txn_start, getdate()) % 60 || ' secs' AS txn_duration
FROM svv_transactions a LEFT JOIN (SELECT pid, relation, granted FROM pg_locks GROUP BY 1, 2, 3) b 
ON a.relation = b.relation AND a.granted = 'f' AND b.granted = 't' LEFT JOIN pg_class d ON a.relation = d.oid 
WHERE a.relation IS NOT NULL;

This query identifies sessions that are holding locks and blocking other sessions waiting for a lock. It provides information about the user, database, transaction ID, session ID, lock mode, table involved, and the duration of the transaction. If you identify a blocking session holding a lock for an extended period, you can check if the associated transaction or query is active or left open using the sys_query_history view.

  1. Terminating Blocking Sessions:
SELECT pg_terminate_backend(<blocking_session_pid>);

If you've identified a blocking session that needs to be terminated, you can use this query to terminate the backend process associated with the blocking session. Replace <blocking_session_pid> with the actual process ID of the blocking session.

  1. Retrieve Table information details like size, rows, dist and sort key info, unsorted and stats_off etc and run VACUUM and ANALYZE wherever required.
SELECT * FROM svv_table_info;
SELECT * FROM sys_analyze_history order by start_time desc;
SELECT * FROM sys_vacuum_history order by start_time desc;

These queries provide information about table statistics, analyze operations, and vacuum operations. Monitoring table statistics is crucial for maintaining optimal query performance, as outdated or missing statistics can lead to inefficient query execution plans.

  1. Monitoring and Troubleshooting Load and Unload Operations:
SELECT * FROM sys_load_history WHERE query_id=<query_id> ORDER BY start_time desc;
SELECT * FROM sys_load_error_detail WHERE query_id=<query_id> ORDER BY start_time desc;
SELECT * FROM sys_unload_history ORDER BY query_id, start_time;

These queries provide detailed information about load and unload operations, including their status, start time, and any associated errors. Replace <query_id> with the actual query ID of the load or unload operation you want to monitor or troubleshoot.

  1. Monitoring external (federated and Spectrum) queries and troubleshooting errors:
SELECT * FROM sys_external_query_detail WHERE query_id=<query_id> ORDER BY query_id,segment_id, start_time;
SELECT * FROM sys_external_query_error;

This query provides segment level information for external queries and details like the number of rows processed, number of bytes processed, and partition info of external tables in Amazon S3 and Postgres.

  1. Monitoring Datasharing objects and its usage:
SELECT * FROM SYS_DATASHARE_CHANGE_LOG;
SELECT * FROM SYS_DATASHARE_USAGE_CONSUMER;
SELECT * FROM SYS_DATASHARE_USAGE_PRODUCER;

These Views records the consolidated view for tracking changes to data shares on both producer and consumer clusters and Records the activity and usage of Datashares on producer and Consumer.

  1. Monitoring Streaming ingestion queries:
SELECT * FROM SYS_STREAM_SCAN_STATES ORDER BY record_time DESC;
SELECT * FROM SYS_STREAM_SCAN_ERRORS ORDER BY record_time DESC;

Records scan states for records loaded via streaming ingestion and related error information.

  1. Monitoring Sessions and Connections:
SELECT * FROM sys_session_history where status='active' order by start_time; -- current active sessions
SELECT user_id,count(*) FROM sys_session_history where status='active' and user_id<>1 group by 1 order by 2 desc; -- current active sessions by user.
SELECT * FROM sys_connection_log where session_id=<session_id> ORDER BY record_time desc; -- connection history for all user sessions.

These queries provide information about user sessions and connections, including session history and connection logs. Replace <session_id> with the actual session ID you want to investigate for a specific session.

  1. Monitoring Transactions:
select * from sys_transaction_history order by transaction_start_time desc;

This query retrieves information about transaction history, including their start time, end time, and other relevant details. It can help you identify long-running transactions or potential issues related to transaction management.

  1. Monitoring User Activities:
Select * from SYS_USERLOG order by record_time desc;

This query provides a log of user activities, such as create, drop and Alter user properties.

  1. Billing and Cost Monitoring:
select trunc(start_time) "Day", 
(sum(charged_seconds)/3600::double precision) * <Price for 1 RPU> as cost_incurred_USD
from sys_serverless_usage 
group by 1 
order by 1;

This query calculates the cost incurred for Redshift Serverless usage on a daily basis. Replace <Price for 1 RPU> with the actual price per RPU (Redshift Processing Unit) hour by referring Redshift pricing for Serverless for respective AWS region. It helps you monitor and manage your billing effectively.

  1. Identifying Long-Running Queries:
WITH cte AS (
    SELECT
        date_trunc('day', start_time) AS query_day,
        user_id,
        query_text,
        ROUND(execution_time / 1000000.0, 2) AS execution_time_seconds,
        ROW_NUMBER() OVER (PARTITION BY date_trunc('day', start_time) ORDER BY execution_time DESC) AS rn
    FROM
        sys_query_history
    WHERE
        start_time >= dateadd('day', -7, current_date) -- Adjust the date range as needed
)
SELECT
    query_day,
    user_id,
    query_text,
    execution_time_seconds
FROM
    cte
WHERE
    rn <= 10
ORDER BY
    query_day DESC,
    execution_time_seconds DESC;

This query identifies the top 10 longest-running queries for each day within the specified date range. It provides the query text, user ID, and execution time in seconds, which can help optimize poorly performing queries. However, the 60-second minimum billing is for the entire Redshift serverless warehouse based on RPU usage seconds, not per query. The primary factor influencing Serverless pricing is the RPU (Resource Processing Unit) usage, although there are other elements contributing to the overall Redshift Serverless billing, some of which are highlighted below.

Best practices to control Cost measures on Redshift Serverless:

Cost Control Measures:


  • Amazon Redshift Serverless billing is based on RPU (Redshift Processing Unit) hours per second, charged only when queries are running and not when idle.

  • Storage is billed separately based on the amount of data stored in Redshift, charged per GB per month.

  • There is a minimum charge of 60 seconds for queries using Serverless, even if they complete in less time and metered on a per-second basis beyond 60 seconds.

  • You can set usage limits (in RPU hours) at the workgroup level to control costs. Actions can be specified when the limit is reached, such as logging, sending alerts, or turning off user queries.

  • For cross-region data sharing, limits can be set in terms of data volume (TB).

  • Query Monitoring Rules (QMR) can be used to control bad queries or those consuming excessive resources by setting limits on execution time, queue time, rows joined, and nested loop join row counts.

  • Canceled queries are still billed for the time the query ran if you abort before it finishes.

Best Practices:


  • Minimize open transactions by properly ending them with the END statement. Open transactions can generate unnecessary RPU usage.

  • Use the SESSION TIMEOUT setting to automatically end open transactions and idle sessions after a specified period of inactivity.

  • Amazon Redshift Serverless has a maximum query runtime of 24 hours and a maximum open transaction period of 6 hours before automatically ending the associated session.

By leveraging these SYS Monitoring Views, SYS Mapping views and provided queries , you can effectively monitor and troubleshoot your queries on Amazon Redshift Serverless.

profile pictureAWS
ENGENHEIRO DE SUPORTE
publicada há 4 meses1,1 mil visualizações