Skip to content

How do I use and manage Amazon Redshift WLM memory allocation?

7 minute read
0

I want to check the concurrency and Amazon Redshift workload management (WLM) allocation to the queues.

Short description

Use one of the following methods to configure WLM to effectively manage resources:

  • Automatic WLM: Amazon Redshift manages the concurrency level of the queues and memory allocation for each dispatched query. The dispatched query allows you to define the query priority of the workload or users to each of the query queues.
  • Manual WLM: You have more control over concurrency level and memory allocation to the queues. As a result, queries with more resource consumption can run in queues with more resource allocation.

It's a best practice to use automatic WLM because it uses machine learning to predict the amount of memory to assign the queries. If you're using manual WLM and you want to migrate to automatic WLM, then see Migrating from manual WLM to automatic WLM.

Note: To define metrics-based performance boundaries, use a query monitoring rule (QMR) with your WLM configuration.

Resolution

To check the concurrency level and WLM allocation to the queues, take the following actions:

  • Check the current WLM configuration of your Amazon Redshift cluster.
  • Create a test WLM configuration, and specify the query queue's distribution and concurrency level.
  • (Optional) If you use manual WLM, then determine how the memory is distributed between the slot counts.

Check the current WLM configuration and memory usage

Use the STV_WLM_SERVICE_CLASS_CONFIG table to check the current WLM configuration of your Amazon Redshift cluster.

Example WLM configuration:

[
  {
    "query_concurrency": 2,
    "memory_percent_to_use": 30,
    "query_group": [],
    "query_group_wild_card": 0,
    "user_group": [
      "user_group1"
    ],
    "user_group_wild_card": 0,
    "rules": [
      {
        "rule_name": "BlockstoDiskSpill",
        "predicate": [
          {
            "metric_name": "query_temp_blocks_to_disk",
            "operator": ">",
            "value": 50
          }
        ],
        "action": "abort"
      }
    ]
  },
  {
    "query_concurrency": 5,
    "memory_percent_to_use": 40,
    "query_group": [],
    "query_group_wild_card": 0,
    "user_group": [
      "user_group2"
    ],
    "user_group_wild_card": 0
  },
  {
    "query_concurrency": 5,
    "memory_percent_to_use": 10,
    "query_group": [],
    "query_group_wild_card": 0,
    "user_group": [],
    "user_group_wild_card": 0,
    "rules": [
      {
        "rule_name": "abort_query",
        "predicate": [
          {
            "metric_name": "scan_row_count",
            "operator": ">",
            "value": 1000
          }
        ],
        "action": "abort"
      }
    ]
  },
  {
    "query_group": [],
    "query_group_wild_card": 0,
    "user_group": [],
    "user_group_wild_card": 0,
    "auto_wlm": false
  },
  {
    "short_query_queue": false
  }
]

Note: The preceding example WLM configuration is in JSON format and uses the QMR, Queue 1. In the example, memory_percent_to_use is the actual amount of working memory that's assigned to the service class.

Amazon Redshift allocates memory from the shared resource pool in your cluster. Queue 1 has a memory allocation of 30% that's divided into two equal slots because the concurrency is set as 2. Each slot gets an equal 15% share of the current memory allocation.

Queue2 has a memory allocation of 40% that's divided into five equal slots because the concurrency is set as 5. Each slot gets an equal 8% of the memory allocation. The default queue uses 10% of the memory allocation with a queue concurrency level of 5.

Use the following query to check the service class configuration:

select rtrim(name) as name,
num_query_tasks as slots,
query_working_mem as mem,
max_execution_time as max_time,
user_group_wild_card as user_wildcard,
query_group_wild_card as query_wildcard
from stv_wlm_service_class_config
where service_class > 4;

Example output:

                        name                        | slots | mem | max_time | user_wildcard | query_wildcard
----------------------------------------------------+-------+-----+----------+---------------+----------------
 Service class for super user                       |     1 | 297 |        0 | false         | false
 Queue 1                                            |     2 | 522 |        0 | false         | false
 Queue 2                                            |     5 | 278 |        0 | false         | false
 Default queue                                      |     5 |  69 |        0 | false         | false
 Service class for vacuum/analyze                   |     0 |   0 |        0 | false         | false

Queue 1 has a slot count of 2 and the memory allocated for each slot, or node, is 522 MB. The memory allocation that's assigned to the service class is the actual amount of current working memory in MB per slot for each node.

Note: If all the query slots are used, then Amazon Redshift manages the unallocated memory. When a queue requests additional memory, the system temporarily gives unallocated memory to the queue.

For more information about unallocated memory management, see WLM memory percent to use.

Identify high-level tuning parameters

Use the SVL_QUERY_METRICS_SUMMARY table to check the detailed execution and the query_queue_time column to view the queries that are queued. The query_queue_time column shows that the query is in the queue for a WLM slot to run.

Example table:

dev=# select userid, query, service_class, query_cpu_time, query_blocks_read, query_execution_time, query_cpu_usage_percent, query_temp_blocks_to_disk, query_queue_time  from SVL_QUERY_METRICS_SUMMARY where query=29608;

 userid | query | service_class | query_cpu_time | query_blocks_read | query_execution_time | query_cpu_usage_percent | query_temp_blocks_to_disk | query_queue_time
--------+-------+---------------+----------------+-------------------+----------------------+-------------------------+---------------------------+------------------
    100 | 29608 |                    8 |                       18 |                          942 |                                64 |                                   10.05 |                                               |
(1 row)


ev=# select query, step, rows, workmem, label, is_diskbased
from svl_query_summary
where query = 29608
order by workmem desc;

 query | step |   rows   | workmem  |                  label                  | is_diskbased
-------+------+----------+----------+-----------------------------------------+--------------
 29608 |    3 |    49999 | 54263808 | hash   tbl=714                          | f
 29608 |    2 |    49999 |        0 | project                                 | f
 29608 |    0 |    49999 |        0 | scan   tbl=255079 name=part             | f
 29608 |    1 |    49999 |        0 | project                                 | f
 29608 |    6 |  1561938 |        0 | return                                  | f
 29608 |    4 |  1561938 |        0 | project                                 | f
 29608 |    5 |  1561938 |        0 | project                                 | f
 29608 |    2 | 29995220 |        0 | project                                 | f
 29608 |    1 |  1561938 |        0 | return                                  | f
 29608 |    1 | 29995220 |        0 | project                                 | f
 29608 |    0 |  1561938 |        0 | scan   tbl=4893 name=Internal Worktable | f
 29608 |    3 |  1561938 |        0 | hjoin  tbl=714                          | f
 29608 |    0 | 29995220 |        0 | scan   tbl=255087 name=lineorder        | f
(13 rows)

Use the SVL_QUERY_SUMMARY table to check the resource allocation during each step of the query.

Check the is_diskbased and workmem columns to view the resource usage. For more information, see Analyzing the query summary.

Update to WLM dynamic configuration properties

Use WLM dynamic configuration properties to adjust to workloads that change. You can apply dynamic properties to the database without a cluster reboot. However, the change between automatic WLM and manual WLM is static and requires a cluster reboot to take effect.

For more information, see STV_WLM_SERVICE_CLASS_CONFIG.

The following is an example of a cluster that's configured with two queues:

Queue    Concurrency    % Memory to Use            1        5              60%  
2        5              40%

If the cluster has 200 GB of available memory, then the current memory allocation for each of the queue slots is similar to the following example:

Queue 1: (200 GB * 60% ) / 5 slots  = 24 GB
Queue 2: (200 GB * 40% ) / 5 slots  = 16 GB

To update your WLM configuration properties to be dynamic, modify your settings. See the following example modification:

Queue    Concurrency    % Memory to Use            1        3              75%  
2        4              25%

After you modify the WLM configuration, the memory allocation updates to accommodate the changed workload. See the following example update:

Queue 1: (200 GB * 75% ) / 3 slots = 50 GB
Queue 2: (200 GB * 25% ) / 4 slots = 12.5 GB

Note: If there are queries in the WLM queue during a dynamic configuration update, then Amazon Redshift waits for the queries to complete. After the query completes, Amazon Redshift updates the cluster with the updated settings.

Use the STV_WLM_SERVICE_CLASS_CONFIG table when you're transitioning to dynamic WLM configuration properties.

If the num_query_tasks and target_num_query_tasks values are different, then a dynamic WLM transition is in process. A value of -1 shows that Auto WLM is configured.

Identify insufficient memory allocated to the query

If a query execution plan in SVL_QUERY_SUMMARY has an is_diskbased value of true, then allocate more memory to the query. To allocate more memory, increase the number of query slots.

For more information, see wlm_query_slot_count.

AWS OFFICIALUpdated 10 months ago