Skip to content

How do I troubleshoot high CPU utilization on my Amazon RDS for SQL Server instance?

6 minute read
1

I'm experiencing high CPU utilization on my Amazon Relational Database Service (Amazon RDS) for Microsoft SQL Server DB instances.

Short description

The following reasons can cause increases in CPU utilization:

  • User-initiated heavy workloads, multiple concurrent queries, or long-running transactions
  • Use of an under-provisioned instance class for the workload
  • Out-of-date statistics and index fragmentation or missing indexes
  • Queries with scope for improvement
  • Blocking and deadlocks
  • Parallelism
  • Frequent compilation and recompilation
  • Parameter sniffing
  • Thread exhaustion

To identify the source of high CPU utilization for your Amazon RDS for SQL Server instance, use the following tools:

After you identify the source, you can analyze and optimize your workload to reduce high CPU utilization.

Resolution

CloudWatch metrics for Amazon RDS

Use CloudWatch metrics for Amazon RDS to identify CPU patterns over extended periods.

To find the times when the workload increases CPU utilization, complete the following steps:

  1. Open the Amazon RDS console.
  2. In the navigation pane, choose Databases, and then select the database you want to monitor.
  3. Choose the Monitoring tab.
  4. Choose the Monitoring menu, and then choose CloudWatch.
  5. Search for the following CloudWatch metrics, and then compare the graphs:
    WriteIOPs
    ReadIOPs
    ReadThroughput
    WriteThroughput
    CPUUtilization
    CPUCreditBalance
    CPUCreditUsage

Note: If the CPU credit balance consistently decreases and the CPU credit usage consistently increases, then there aren't enough CPU cores for the workload. If you use an instance from the t2 or t3 instance class, then check if your instance is under provisioned.

After you identify the time frame, use Enhanced Monitoring to review the data that's associated with your DB instance in more detail. You can set Enhanced Monitoring to collect data at intervals of 1, 5, 10, 15, 30, or 60 seconds. 

Enhanced monitoring

You can set up Enhanced Monitoring to monitor the operating system (OS) that's running on your DB instance.

To check CPU utilization with Enhanced Monitoring, complete the following steps:

  1. Open the Amazon RDS console.
  2. In the navigation pane, choose Databases, and then select the database you want to monitor.
  3. Choose the Monitoring tab.
  4. Choose the Monitoring menu, and then choose OS process list.

Verify if the OS processes, RDS processes, SQL Server processes, or a SQL Agent processes are driving high CPU utilization. You can also check what percentage of CPU and memory these processes are utilizing. 

Use the console to monitor performance metrics. On the Monitoring tab, choose the Monitoring menu, and then choose Manage graphs.

To determine when the CPU is running user processes, running kernels, or is idle, select the graphs for CPU User, CPU System, CPU Idle metrics. Then, select the metrics for Disk I/O and Physical Device I/O. These metrics include Read IO/s, Write IO/s, Read Kb/s, and Write Kb/s. You can also view memory-related parameters including Available Memory, SQL Server Total Memory, and Total Memory. These metrics are helpful because if the CPU spends more time waiting for resources, you might see high CPU utilization.

For more information, see Viewing OS metrics in the RDS console.

Database insights

Use CloudWatch Database Insights to troubleshoot the major contributors of database load and individual OS processes that are running on a group of instances.

By default, Standard mode of Database Insights is enabled for your Amazon RDS database. To turn on Standard mode when you create or modify your DB instance, see Turning on the Standard mode of Database Insights for Amazon RDS.

Note: If you aren’t using broader permissions, then be sure to grant the required IAM permissions for Database Insights. For more information, see Get started with CloudWatch Database Insights.

For more information about Amazon RDS and instance class support, see Amazon RDS DB engine, Region, and instance class support for Database Insights.

Performance insights

Turn on Performance Insights to identify queries that are responsible for the database load.

Complete the following steps:

  1. Access the Performance Insights dashboard.
  2. Check the Top SQL tab that corresponds with the time-frame you want to analyze.
  3. Identify the query that's taking the longest.
  4. Check the resource-intensive query and the wait events observed during this period. The following are wait events frequently associated with high CPU utilization:

SOS_SCHEDULER_YIELD shows that a worker thread yielded for another thread to run. When wait counts are high with low wait times, the queries are usually CPU bound. When worker threads yield, wait times in the CPU-bound queries might increase. If the wait time is long, then you must review the workload. If SOS_SCHEDULER_YIELD is prevalent, then CPU pressure is the problem. Review the type of workload and perform additional tuning.

CXPACKET and CXCONSUMER are parallelism-related wait events that aren't usually a concern. But if wait events are frequent and are affecting performance, then review the queries and set appropriate values for the cost threshold of parallelism. Make sure that SQL Server is choosing the lower cost parallelism parameter in the parameter group. You can also increase the max degree of parallelism, MAXDOP, to 1 at the query or instance level.

ThreadPool shows thread exhaustion. If your instance class can handle it, then increase the max worker threads parameter. You might experience ThreadPool waits when excessive threads are used because of blocking, a high workload, or a high number of parallel queries. Also, if you misconfigure the max worker threads parameter, then you might experience the ThreadPool wait event.

Check the Database Metrics for batch requests, SQL compilations, and SQL recompilations. Check for queries that are being compiled more than once. Also check if the queries are frequently recompiled for a given batch. If so, then this indicates that the WITH RECOMPILE clause is used in the query code. Both of these reasons can cause excessive CPU usage.

SQL Server tools

To troubleshoot high CPU utilization with SQL Server tools, take the following actions: