Improve stored procedure logging performance for Amazon Redshift

Lesedauer: 3 Minute
Inhaltsebene: Expertenwissen
0

Amazon Redshift is an MPP large-scale analytics warehouse, is not suited for large volumes of single-row insert, update, or delete operations. Stored procedures can generate 100's of single row inserts to log tables to track the process flow and progress of stored procedures. These single row inserts or updates tend to slow down the overall speed of stored procedure execution. This article will provide an alternative logging method which can improve the stored procedure execution performance.

Amazon Redshift is designed for processing large volumes of data to deriving business insights. It stores data in columnar format, in 1 MB immutable blocks. Since the data blocks are immutable, Redshift processes an Update as Delete + Insert. Each update results in the older block(s) being marked for deletion and new block(s) gets created with the new values.

Now lets take into account the columnar format where each column gets stored in separate blocks. So for a table with 10 columns, a single row update results in 20 MB (10 MB delete + 10 MB insert) of operation, and a single row insert results in 10 MB of operation.

And finally, all commits to database have to go in a single queue i.e. are serialized. Typically the step-level logging from Stored Procedure (SP) results in 100's to 1000's of single row insert/updates to some execution_status or like table which is intended to track the progress of the SP and handle audit checks. If you are running 100's of such SP's then the operations on the execution_status table slow down the overall SP execution time, leading to a poor experience.

Alternative logging method

Amazon Redshift provides the RAISE statement and you can leverage the RAISE INFO for step-level logging from stored procedures. And leverage RAISE EXCEPTION to fail a stored procedure when the audit checks fail. The RAISE INFO will log step-level messages to Redshift system table SYS_PROCEDURE_MESSAGES. This does not increase runtimes of your stored procedures by any significant margin.

You can define your custom logging text to a fixed pattern, for example lets take procedure name + | + step name + | + table name + | + operation name + | + status text + | + start timestamp + | + end timestamp.

Note

The MESSAGE column from SYS_PROCEDURE_MESSAGES is restricted to 1024 characters, so make sure to stay within this limit with your custom logging text pattern.

SQL to extract logs

You can leverage the below SQL to extract elements from your custom logging text.

select
  record_time,
  query_id,
  SPLIT_PART(message, '|', 1) AS procedure_name,
  SPLIT_PART(message, '|', 2) AS step_name,
  SPLIT_PART(message, '|', 3) AS table_name,
  SPLIT_PART(message, '|', 4) AS operation_name,
  SPLIT_PART(message, '|', 5) AS status_text,
  SPLIT_PART(message, '|', 6) AS start_timestamp,
  SPLIT_PART(message, '|', 7) AS end_timestamp
from SYS_PROCEDURE_MESSAGES
where log_level = 'INFO'
and procedure_name = 'sp_load_sales_fct'
and record_time >= TRUNC(SYSDATE)
and message is not null
order by record_time;

You can also choose save the stored procedure execution log to your actual final logging table at a later time, every few hours or every day, using a single INSERT INTO SELECT operation which will be much faster than individual row-by-row inserts.

profile pictureAWS
EXPERTE
veröffentlicht vor 2 Monaten2151 Aufrufe