Skip to content

Performance issue on RDS after major release postgresql upgrade

0

Enter image description here As you can see on the cloudwatch cpu utilization histogram we have two major changes : First one mid april when we upraded the size of the postgresql engine and the result was a significant reduction of the cpu (and better reponses times for our application during peak hours usage) The second one is the reason of this message : the 24th of july we did a version upgrade of Postgresql from 12 to 15 . As you can see one of the two servers of the clusters had a increase of the cpu utilization causing us then a lot of slow responses on our app. We already run the ANALYZE VERBOSE on all the databases. Could you help us understand this issue? Kind regards, Adrien Sanvoisin

2 Answers
0

The CPU utilization increase you're experiencing after upgrading PostgreSQL from version 12 to 15 is a common issue with major version upgrades. There are several factors that could be contributing to this performance degradation:

  1. Statistics and query plans: After a major version upgrade, the query optimizer needs time to rebuild its statistics and adjust query execution plans. This often leads to suboptimal query performance initially, as the new version's optimizer may handle queries differently than the previous version.

  2. Cache warming: During the upgrade, database buffer caches and query caches are typically cleared. It takes time for these caches to be repopulated with frequently accessed data and query plans, resulting in higher CPU usage as the system works harder to retrieve data from disk rather than from cache.

  3. Background processes: Post-upgrade, there might be background processes running to ensure database consistency and apply necessary changes, temporarily impacting overall performance.

  4. Schema changes: Major version upgrades might introduce schema changes or modifications that require additional processing time.

While you've already run ANALYZE VERBOSE, which is a good first step, here are additional steps to address the issue:

  • Reindex important indexes to ensure they're optimized for the new version
  • Compare execution plans between the old and new versions for your most resource-intensive queries
  • Review parameter group settings to ensure they're appropriate for the new version
  • Enable and review Performance Insights to identify specific queries causing high CPU usage
  • Monitor specific metrics like Queries, Com_select, and Innodb_rows_read to pinpoint the source of high CPU usage
  • Check for orphaned large objects which can impact performance after upgrades

It's also worth noting that some performance issues after upgrades resolve themselves over time as the system stabilizes and caches warm up. However, if the issue persists, you may need to tune specific queries or database parameters based on the findings from the monitoring steps above.
Sources
Aurora RDS slow performance | AWS re:Post
Resolve high CPU usage after Amazon RDS for MySQL and Aurura MySQL-Compatible upgrade | AWS re:Post
Maintain application consistency after RDS PostgreSQL upgrade | AWS re:Post
Why do large objects lead to slowness or failure of major version upgrades in RDS/Aurora PostgreSQL? | AWS re:Post

answered a month ago
0

Hello Adrien Sanvoisin,

Thank you for sharing your query with us. I understand your concern about the increased CPU utilization after the Major Version Upgrade of RDS PostgreSQL. While I'm providing these recommendations on a best-effort basis, these investigative steps should help you identify and resolve the performance issues:

  1. Performance Insights Analysis:

You can begin by reviewing your Performance Insights dashboard to identify CPU usage patterns and specific high-spike periods. Next , you can check your instance's vCPU capacity using the Amazon RDS instance types documentation [1] and Amazon EBS-optimized instance types documentation [2].

During your analysis, focus on long-running queries during peak CPU utilization periods, particularly noting if any queries are utilizing more CPU than your instance's available vCPUs. It's crucial to examine the associated wait events during these high-utilization periods, especially CPU wait events which are typically common during such scenarios. For a detailed understanding of wait events and their implications, you can refer to the RDS for PostgreSQL wait events documentation [3].

[1] Amazon RDS instance types: https://aws.amazon.com/rds/instance-types/

[2] Amazon EBS-optimized instance types: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html

[3] RDS for PostgreSQL wait events: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Tuning.concepts.summary.html

You can check the Performance Insights dashboard for this time frame at your end, in order to identify the queries corresponding to these Support SQL Digest IDs. Please be informed, because of security reasons, AWS Premium Support engineers do not have visibility to the actual SQL query. Please use the link below to retrieve the actual SQL queries or you can follow the below steps to map SQL Hkid to the SQL text in Performance Insights:

a) Navigate to https://console.aws.amazon.com
b) Navigate to the RDS Dashboard c) On the navigation pane on the left, click on "Performance Insights" d) On the left hand sidebar on the Performance Insights page, select the DB instance for which you wish to view the Performance Insights. e) Toward of the center of the page, on the right hand side, you will see a "settings" icon. Click this icon, and enable "Support ID". This will enable the SQL Hkid display that I shared with you. f) Once this is done, adjust the time frame to all and select the area that covers timestamp and you will be able to view the SQL Hkid, along with the actual SQL query that was running during the time frame set by you.

[+] Monitoring with the Performance Insights Dashboard - Top Load Items Table: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.UsingDashboard.html#USER_PerfInsights.UsingDashboard.Components.AvgActiveSessions.TopLoadItemsTable

  1. CloudWatch Metrics Analysis:

Next, thoroughly examine your CloudWatch metrics, focusing on several critical indicators during the spike periods. Monitor the CPUUtilization metric to understand the overall CPU consumption pattern. Simultaneously, track the DatabaseConnections, DBLoad, DBLoadCPU and DBLoadNonCPU metric to identify any correlation between connection counts and CPU spikes.

Kindly , pay close attention to FreeableMemory and FreeStorageSpace metrics, as drops in either of these could indicate resource constraints affecting your database performance. These metrics can provide valuable insights into whether the increased CPU usage is related to memory pressure or storage limitations.

For detailed information about CloudWatch metrics and their interpretation, you can refer to: [4] Amazon RDS CloudWatch metrics: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/monitoring-cloudwatch.html

  1. Table and Index Bloat:

In PostgreSQL databases, bloat is the extra space allocated to the table or index to maintain old version of rows that are no longer necessary. Bloat can accumulate over time, especially after major changes like upgrades.

We have run the bloat command [5] on the one database, where we could see some tables having bloat.

[5] https://wiki.postgresql.org/wiki/Show_database_bloat

With the help of your team, please check the bloat on the tables involved in above mentioned SQL Digest IDs and indexes on the particular table involved in the queries.

Use pg_repack to reclaim space from bloated tables and indexes [6]. Refer the below document on how to remove bloat on table.

[6] Remove bloat from Amazon Aurora and RDS for PostgreSQL with pg_repack: https://aws.amazon.com/blogs/database/remove-bloat-from-amazon-aurora-and-rds-for-postgresql-with-pg_repack/

  1. Adjust Autovacuum Settings:

Ensure that the autovacuum process is running frequently enough to maintain up-to-date statistics. Adjust the autovacuum settings if necessary based on workload [7].

[7] Working with PostgreSQL autovacuum on Amazon RDS for PostgreSQL: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Autovacuum.html

  1. Query plan changes:

Every major version includes enhancements to the query optimizer that are designed to improve the database performance. But the DB planner/optimizer might not choose the most optimal plan for running your queries. As a result, you might see performance degradation when running the same queries in a new major version. To address this:

Use EXPLAIN (ANALYZE, BUFFERS) to identify which plan nodes are causing performance degradation [8]. The numbers provided by BUFFERS after running this command help to identify which parts of the query are the most I/O-intensive. You can consider re indexing the queries based on statistics of the query. You can refer the below document for the same.

[8] https://www.postgresql.org/docs/current/using-explain.html

I would also suggest you to refer the below documentation for the issue: [+] How do I troubleshoot high CPU utilization for Amazon RDS or Amazon Aurora PostgreSQL?: https://repost.aws/knowledge-center/rds-aurora-postgresql-high-cpu

If after implementing these recommendations if you continue to experience performance issues, then we encourage you to open a support case with us. Please ensure to create the case from the AWS account where your database resources reside, and include your instance ARN along with details of your observations. This will help us provide more targeted assistance for your specific situation. We're here to help ensure optimal performance of your database workload.

AWS
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.