You are on page 1of 3

How can I troubleshoot high CPU utilization

for Amazon RDS or Amazon Aurora


PostgreSQL?
Last updated: 2019-04-16
How can I identify and resolve the cause of high CPU utilization in Amazon Relational Database
Service (Amazon RDS) or Amazon Aurora PostgreSQL?

Short Description
If you see that your load has high CPU usage, you can use a combination of the following tools to
identify the cause of a CPU load:

 Amazon CloudWatch metrics can help identify patterns of CPU usage over a period of time.
 Enhanced Monitoring provides a view at the operating system (OS) level, which can help identify the
cause of a high CPU load at a granular level. For example, you can review the load average, CPU
distribution (system% or nice%), and OS process list.
 Enhanced Monitoring can also help identify micro bursts of CPU usage. Micro bursts can be missed
by CloudWatch if the spike or burst of activity occurs for a period of less than 60 seconds. Enhanced
Monitoring can identify these spikes, because you can configure the monitoring to view one-second
granularity.
 Performance Insights can help identify the exact queries that are responsible for load and the host
that initiated the queries.
 Native PostgreSQL view and catalogs (such as pg_stat_statements, pg_stat_activity,
and pg_stat_user_tables) allow you to review database-level details. For more information, see the
PostgreSQL documentation for Monitoring Database Activity and pg_stat_statements.
 PostgreSQL provides a variety of logging parameters to log long-running queries, autovacuum, lock
waits, and connection and disconnection requests. For more information, see How do I enable query
logging using Amazon RDS for PostgreSQL?

After you identify the cause, you can use the following methods to further lower the CPU usage:

 If there are opportunities for tuning, use EXPLAIN and EXPLAIN ANALYZE to identify caveats. For
more information, see the PostgreSQL Documentation for EXPLAIN.
 If there is a query that is running repeatedly, use prepared statements to lower the pressure on your
CPU.

Resolution
Amazon CloudWatch metrics

You can use CloudWatch metrics to identify the CPU patterns over extended periods. You can
compare the CloudWatch metric to identify the time frames during which CPU usage is highest. After
the time frame is identified, you can review the Enhanced Monitoring data associated with your DB
instance. You can set Enhanced Monitoring to collect data at intervals of 1, 5, 10, 15, 30, or 60
seconds, which allows you to collect data at a more granular level than CloudWatch. For more
information, see the Differences Between CloudWatch and Enhanced Monitoring Metrics.

Enhanced Monitoring

By using Enhanced Monitoring, you can check the loadAverageMinute data in intervals of 1, 5, and
15 minutes. If the load average is greater than the number of vCPUs, this indicates that the instance
is under a heavy load. Also, if the load average is less than the number of vCPUs for the DB
instance class, the application latency might not be caused by CPU throttling. Check the load
average to avoid false positives when diagnosing the cause of CPU usage.

For example, if you have a DB instance that is using a db.m4.2xlarge instance class with 3000
provisioned IOPS that reaches the CPU limit, you can review the following example metrics to
identify the root cause of the high CPU usage. In the following example, the instance class has eight
vCPUs associated with it. For the same a load average, exceeding 170 indicates that the machine is
under heavy load during the time frame measured:
Load Average Minute

Fifteen 170.25

Five 391.31

One 596.74

CPU Utilization

User (%) 0.71

System (%) 4.9

Nice (%) 93.92

Total (%) 99.97

Note: Your workload is given a higher priority over other tasks that are running on the DB instance.
To prioritize these tasks, workload tasks are given a higher Nice value. As a result, in Enhanced
Monitoring, Nice% represents the amount of CPU being used by your workload against the
database.

After enabling Enhanced Monitoring, you can also check the OS process list that is associated with
the DB instance. Enhanced monitoring shows only the top 50 processes, but this can help you
identify which processes have the largest impact on performance based on CPU and memory use.

Performance Insights
You can use Amazon RDS Performance Insights to identify the query responsible for the database
load after checking the SQL tab that corresponds to a particular time frame.

Native PostgreSQL view and catalogs

At database-engine level, if the issue occurs in real time, you can


use pg_stat_activity or pg_stat_statements. These can help you group the machines, clients, and IP
addresses that send the most traffic. You can also use this data to see if there are increases over
time, increases in application servers, or if an application server has stuck sessions or locking
issues. For more information, see the PostgreSQL Documentation
for pg_stat_activity and pg_stat_statements. To enable pg_stat_statements, modify the existing
custom parameter group and set the following values:

 shared_preload_libraries = pg_stat_statements
 track_activity_query_size = 2048
 pg_stat_statements.track = ALL
 pg_stat_statements.max = 10000

Choose Apply Immediately, and reboot the DB instance. Then, run a command similar to the
following on the database that you want to monitor:

Note: The following example installs the extension in the "demo" database.
demo=> select current_database();
current_database
------------------
demo
(1 row)

demo=> CREATE EXTENSION pg_stat_statements;


After pg_stat_statements is set up, you can monitor the output by using one of the following
methods:

 List queries by total_time


 List queries with total number of calls, total rows, and rows returned
 List queries on a per execution basis

For information about the columns in the following example queries, see the PostgreSQL
Documentation for pg_stat_statements.

List queries by total_time and see which query spends most time in the database:

You might also like