How do I troubleshoot performance issues with VACUUM in Amazon Redshift?

6 minute read
0

I'm concerned about the performance impact of VACUUM on my Amazon Redshift cluster. Why is VACUUM taking so long to run, and what best practices should I consider when running the VACUUM operation on my Amazon Redshift cluster?

Short description

VACUUM is a resource-intensive operation, which can be slowed down by the following:

  • High percentage of unsorted data
  • Large table with too many columns
  • Interleaved sort key usage
  • Irregular or infrequent use of VACUUM
  • Concurrent tables, cluster queries, DDL statements, or ETL jobs

Use the svv_vacuum_progress query to check the status and details of your VACUUM operation. Then, follow the VACUUM best practices to troubleshoot and avoid any future issues.

Resolution

Note: The following is applicable to provisioned Amazon Redshift clusters. The following system tables and queries won’t work on Amazon Redshift Serverless.

To check if the VACUUM operation is in progress, run the svv_vacuum_progress query:

dev=# SELECT * FROM svv_vacuum_progress;
table_name |          status                 | time_remaining_estimate
-----------+---------------------------------+-------------------------
 data8     |  Vacuum: initialize merge data8 | 4m 55s
(1 row)

The svv_vacuum_progress query also verifies the table name being vacuumed, the vacuum's status, and the estimated time remaining until completion. If there is no vacuum running, then the svv_vacuum_progress query shows the status of the last run vacuum.

Note: The svv_vacuum_progress query returns only one row of results.

Check the details of the table being vacuumed. Specify the table and schema names in the WHERE clause:

SELECT schema, table_id, "table", diststyle, sortkey1, sortkey_num, unsorted, tbl_rows, estimated_visible_rows, stats_off 
FROM svv_table_info 
WHERE "table" IN ('data8');

Here's an example output:

Schema     | table_id | table | diststyle | sortkey1 | sortkey_num | unsorted | tbl_rows  | est_visible_rows | stats_off 
------------+----------+-------+-----------+----------+-------------+----------+-----------+------------------+-----------
 testschema | 977719   | data8 | EVEN      | order_id |  2          |    25.00 | 755171520 | 566378624        | 100.00

From this output, the sortkey1 column shows the main sort key.

If the table has an interleaved sort key, then this column displays the INTERLEAVED state.

  • The sortkey_num column shows the number of columns in the sort key.
  • The unsorted column shows the percentage of rows that need to be sorted.
  • The tbl_rows column shows the total number of rows, including the deleted and updated rows.
  • The estimated_visible_rows is the number of rows that excludes the deleted rows.
  • After a complete vacuum (delete and sort), the value for tbl_rows and estimated_visible_rows should resemble each other, and unsorted should reach 0. Note: Data in the table updates in real time. To check the progress of VACUUM, continue running the query. Note that the unsorted rows gradually decrease as VACUUM progresses. To verify whether you have a high percentage of unsorted data, check the VACUUM information for a specific table.

Run the following query to check VACUUM information for a table, specifying the table ID from the previous query:

SELECT table_id, status, rows, sortedrows, blocks, eventtime
FROM stl_vacuum
WHERE table_id=977719
ORDER BY eventtime DESC LIMIT 20;

Here's an example output:

table_id |             status             |    rows    | sortedrows | blocks |         eventtime         
----------+--------------------------------+------------+------------+--------+----------------------------
   977719 | [VacuumBG] Finished            |  566378640 |          0 |  23618 | 2020-05-27 06:55:33.232536
   977719 | [VacuumBG] Started Delete Only | 1132757280 |  566378640 |  47164 | 2020-05-27 06:55:18.906008
   977719 | Finished                       |  566378640 |  566378640 |  23654 | 2020-05-27 06:46:04.086842
   977719 | Started                        | 1132757280 |  566378640 |  45642 | 2020-05-27 06:28:17.128345
(4 rows)

The output lists the latest events first, followed by older events, in sorted order.

  • The last vacuum performed was an automatic VACUUM DELETE, which started at 2020-05-27 06:55:18.906008 UTC and completed in a few seconds.
  • This vacuum released the space occupied by deleted rows, confirmed by the number of rows and blocks displayed when the vacuum started and completed.

Note the changes that occur in the number of blocks occupied by the table from the start and completion of VACUUM.

Note: Amazon Redshift automatically performs vacuum sort and vacuum delete operations on tables in the background. These background vacuums run during periods of reduced loads and are paused during periods of high load. This automatic vacuum lessens the need to run the vacuum command.

The sortedrows column shows the number of sorted rows in the table. In the last vacuum, no sort was done, because it was an automatic VACUUM DELETE operation. The row marked for deletion displays the same number of sorted rows from when VACUUM started, because the active rows were not sorted. After VACUUM DELETE completes, it indicates 0 sorted rows.

The initial vacuum, which started at 2020-05-27 06:28:17.128345 UTC, shows a full vacuum. It released the space from deleted rows and sorted rows after about 18 minutes. When the vacuum operation completed, the output shows the same values for rows and sortedrows because the vacuum successfully sorted the rows.

For a vacuum that is already in progress, continue to monitor its performance and incorporate VACUUM best practices.

VACUUM best practices

VACUUM performance can be improved with the following best practices:

  • Because VACUUM is a resource-intensive operation, run it during off-peak hours.
  • During off-peak hours, use wlm_query_slot_count to temporarily override the concurrency level in a queue for a VACUUM operation.
  • Run the VACUUM operation with a threshold parameter of up to 99% for large tables.
  • Determine the appropriate threshold and frequency of running VACUUM. For example, you might want to run VACUUM at a threshold of 100%, or have your data always sorted. Use the approach that optimizes your Amazon Redshift cluster's query performance.
  • Run a VACUUM FULL or VACUUM SORT ONLY often enough that a high unsorted Region doesn't accumulate in large tables.
  • If there is a large amount of unsorted data on a large table, then perform a deep copy. A deep copy can help you load the data into a new table instead of running VACUUM SORT on the existing table.
  • Run the VACUUM command with the BOOST option. The BOOST option allocates additional resources to VACUUM, such as available memory and disk space. With the BOOST option, VACUUM operates in one window and blocks concurrent deletes and updates for the duration of the VACUUM operation.
    Note: Running VACUUM with the BOOST option might affect query performance. It's a best practice to run the VACUUM BOOST operation only during maintenance operations or off-peak hours.
  • Divide any large tables into time-series tables to improve VACUUM performance. In some cases, using a time-series table can fulfill the need for running VACUUM.
  • Choose a column compression type for large tables. Compressed rows consume less disk space when sorting data.
  • Use ANALYZE command after VACUUM operation to update the statistics, which are used by the query planner to choose optimal plans.

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago