By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How do I troubleshoot high CPU utilization on my Amazon RDS for Oracle database?

8 minute read
0

My Amazon Relational Database Service (Amazon RDS) for Oracle DB instance has a high rate of CPU utilization.

Resolution

When you diagnose issues related to high CPU utilization, identify the time period when the issue occurred.

CloudWatch metrics

Amazon RDS sends metrics to Amazon CloudWatch every minute for each active database. Review the following CloudWatch metrics for Amazon RDS to identify CPU patterns over extended periods:

  • CPUUtilization
  • CPUCreditUsage, if you use a T2 or T3 instance
  • CPUCreditBalance, if you use a T2 or T3 instance

Also, review the following metrics to check if there was a change in the workload and any thresholds were breached:

  • DatabaseConnections
  • DiskQueueDepth
  • FreeableMemory
  • ReadIOPS
  • ReadLatency
  • WriteIOPS
  • WriteLatency

Note: These factors can contribute to the spike in CPU utilization.

For more information, see Viewing instance status.

Enhanced Monitoring metrics

Enhanced Monitoring provides metrics in real time for the operating system that your DB instance runs on.

To view the time period of the spike in the CPU utilization, do the following:

  1. Open the Amazon RDS console.
  2. In the navigation pane, choose Databases.
  3. Choose the database that you want to monitor.
  4. Choose the Monitoring tab.
  5. From the Monitoring dropdown list, select Enhanced monitoring.
  6. Under the Enhanced Monitoring view, if the instance is a Multi-AZ deployment, then to view the OS metrics of the primary instance, select primary. To view the metrics for the standby replica, select secondary.
  7. Select the date and start time.
  8. On the right corner of the page, select the duration. You can select 5 minutes, 15 minutes, 30 minutes, or 1 hour.

The CPU Total graph indicates the time period when the CPU utilization increased.

The Load Avg 1 min, Load Avg 5 min, and Load Avg 15 min graphs show the number of processes that were requesting the CPU time. The CPU time is calculated over the last minute. These graphs also show the last five minutes and the last 15 minutes. If the load average is greater than the number of vCPUs, then the instance might experience a CPU bottleneck.

To view the OS processes, select OS process list from the Monitoring dropdown list. Then, to identify the process that has the most CPU usage, sort the list by CPU% values. For more information, see Viewing OS metrics in the RDS console.

After you identify the process that has the most CPU utilization, run this query to map the process ID to a session on the database:

SET LINESIZE 120;
SET PAGES 200;
COL OSUSER FOR a20;
COL USERNAME FOR a20;
COL MACHINE FOR a20;
SELECT a.sid, a.serial#, a.osuser, a.username, a.machine, a.sql_id, c.sql_text FROM v$session a, v$process b, v$sql c 
WHERE a.paddr=b.addr AND b.spid=&spid AND a.sql_id=c.sql_id(+);

By default, all the Enhanced Monitoring graphs don't appear on the Enhanced Monitoring dashboard. To turn on additional graphs and get a view of the workload at the time of the spike in CPU utilization, complete the following steps:

  1. Open the Amazon RDS console.
  2. In the navigation pane, choose Databases.
  3. Choose the database that you want to monitor.
  4. Choose the Monitoring tab.
  5. From the Monitoring dropdown list, select Enhanced monitoring.
  6. Under the Enhanced Monitoring view, choose Manage graphs.
  7. Select the graphs that you want to view.
  8. Choose Save.

You can choose to view the following example graphs:

Memory

  • Free Memory
  • Cached Memory
  • Buffered Memory
  • Total Memory
  • Dirty Memory
  • Active Memory
  • Slab Memory

Note: Metrics related to memory are retrieved from the /proc/meminfo file.

Swap

  • Swap
  • Free Swap

Disk I/O and Physical Device I/O

  • Read IO/s
  • Write IO/s
  • Ave Queue Size
  • Await

CPU

  • CPU User
  • CPU Total
  • CPU System
  • CPU Wait
  • CPU Idle
  • CPU Nice

For the list of available metrics, see Overview of Enhanced Monitoring.

For more information on Enhanced Monitoring, see Monitoring OS metrics with Enhanced Monitoring.

For information on the cost of Enhanced Monitoring, see Cost of Enhanced Monitoring.

Amazon RDS Performance Insights metrics

From the Performance Insights dashboard, you can visualize the database load and filter the load by waits, SQL statements, hosts, or users.

To view the Performance Insights metrics, complete the following steps:

  1. Open the Amazon RDS console.
  2. In the navigation pane, choose Performance Insights.
  3. Choose the DB instance that you want to monitor.
  4. By default, the statistics for the last 1 hour appear in the Performance Insights dashboard. For the time period that you analyze, choose either Relative or Absolute.
  5. In the Database Load graph, check the time when you experienced a spike in the CPU usage.
  6. Choose the Top waits tab.
    Note: Check the top wait events during the time period of the spike.
  7. Choose the Top SQL tab.
  8. Review and optimize the SQL statements that contributed to the spike.

For information on the cost of Performance Insights, see Performance Insights Pricing.

Statspack

Oracle Statspack is a performance reporting tool that provides the performance metrics of your database over a specific time period.

To use Statspack to review the CPU utilization of your instance, complete the following steps:

  1. Generate a Statspack report for the time period when you experienced an issue.
  2. Review and optimize the queries that result in a high CPU load.
  3. Review the top wait events.

Sample extract from a Statspack report:

-> Total DB CPU (s):           3,345-> Captured SQL accounts for   91.3% of Total DB CPU
-> SQL reported below exceeded  1.0% of Total DB CPU
    CPU                  CPU per            Elapsed                     Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)    Buffer Gets  Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
   3043.36      598,100       0.01   91.0    3356.81     994,096,212  219593194
Module: JDBC Thin Client
SELECT tt.ORDER_TOTAL, tt.SALES_REP_ID, tt.ORDER_DATE, customers.CUST_FIRST_NAME, customers.CUST_LAST_NAME FROM   
(SELECT orders.ORDER_TOTAL, orders.SALES_REP_ID, orders.ORDER_DATE, orders.customer_id, rank() Over (ORDER BY orders.O

For more information, see Oracle Statspack on the Oracle website.

AWR

Automatic Workload Repository is an Oracle performance reporting tool that provides performance metrics over a specific time period. For more information, see Automatic Workload Repository on the Oracle website.

Note: AWR requires a Diagnostic Pack License and is available only for the Enterprise Edition of Oracle.

To use AWR to identify the cause for your CPU load, complete the following steps:

  1. To identify the start and end snapshot ID for the time period of high CPU load, run a query similar to the following query:

    SELECT SNAP_ID, BEGIN_INTERVAL_TIME FROM DBA_HIST_SNAPSHOT ORDER BY 1;
  2. Generate the AWR report.

  3. Download the AWR report.

  4. Review and optimize the queries listed in the SQL ordered by CPU Time section of the AWR report.

  5. Review the top wait events.

In Oracle 12c and later versions, Automatic Database Diagnostic Monitor (ADDM) and Active Session History (ASH) reports are included in the AWR report.

Note: An AWR report that's generated for more than four consecutive snapshot IDs does not include all the ADDM and ASH reports.

ADDM

Automatic Database Diagnostic Monitor is a diagnostic tool that analyzes the AWR data, identifies performance bottlenecks, and provides recommendations.

Note: ADDM requires a Diagnostic Pack license and is available only for the Enterprise Edition of Oracle.

To use ADDM to analyze the AWR data, complete the following steps:

  1. To identify the start and end snapshot ID for the time period of high CPU load, run this sample query:

    SELECT SNAP_ID, BEGIN_INTERVAL_TIME FROM DBA_HIST_SNAPSHOT ORDER BY 1;
  2. Generate the ADDM report.

  3. Download the ADDM report.

  4. Review the recommendations in the ADDM report.

For information on ADDM reports, see FAQ: Automatic Workload Repository (AWR) Reports (Doc ID 1599440.1) on the Oracle website.

ASH

Active Session History is a diagnostic tool that collects active session information. To use ASH to troubleshoot transient performance problems, complete the following steps:

  1. Generate an ASH report for the time period when there was a high CPU load.
    Note: ASH requires a Diagnostic Pack license and is available only for the Enterprise Edition of Oracle.
  2. Download the ASH report.
  3. Review the TOP SQL with TOP Events section.

For information on ASH reports, see Analyzing Sampled Data on the Oracle website.

Oracle SQLT

Amazon RDS supports Oracle SQLTXPLAIN (SQLT) through the use of the SQLT option. SQLT is a tool that's used to diagnose SQL statements that don't perform well.

To create a report for a specific SQL statement, see Oracle SQLT on the Oracle website.

You might receive the following error when you use SQLT:

Error: ORA-20106: SQLT parameter connect_identifier must be set when running SQLT from a remote client.

If this happens, then run one of the following commands before you run the extract:

EXEC sqltxadmin.sqlt$a.set_sess_param('connect_identifier';, '@SID');
EXEC sqltxadmin.sqlt$a.set_param('connect_identifier', '@example-hostname:example-port/example-sid');

Related information

Overview of monitoring metrics in Amazon RDS

Generating performance reports with Automatic Workload Repository (AWR)

How do I check the performance statistics of an Amazon RDS DB instance that is running Oracle?