Skip to content

Increased wait/io/redo_log_flush on RDS Aurora

0

We have an Aurora (MySQL) cluster that we are trying to right size. We have a writer and an autoscaling pool of readers. The last down size was a step too far, it brought the db load outside of our desired loading parameters. So we undid that size reduction. This is all context for my question:

After the last size increase, we noticed that the CPU load on the database did not go back to what it was before. In other words, the same instance size was now showing us a much higher CPU Utilization. A peek into database insights showed us that everything was pretty much the same as before, except for one metric: "wait/io/redo_log_flush". This metric was significantly higher and might very well account for the increased CPU usage.

Google, ChatGPT and even Amazon Q all came back that I need to change the way I make queries to the database (e.g. use Transactions). The thing is, we haven't changed anything on the application end before, during or after the size changes. So the way we were querying was the same and was never a reason for issues.

The actual question: are there other reasons this metric might go up, and how do I remedy this?

asked a year ago1.4K views
1 Answer
1

The increased "wait/io/redo_log_flush" event in your Aurora MySQL cluster, despite not changing your application's querying patterns, suggests that there might be other factors at play. This wait event occurs during write I/O operations for data persistence, and its increase could be due to several reasons:

  1. Increased write activity: Even if your application hasn't changed, there might be an increase in overall write operations, perhaps due to increased traffic or data volume.

  2. Storage performance: The underlying storage system might be experiencing issues or reaching its IOPS limits, causing delays in write operations.

  3. Instance type mismatch: The new instance type might have different I/O characteristics that don't align well with your workload.

  4. Configuration changes: Unintended changes in database configuration parameters during the resize process could affect how frequently the redo log is flushed.

  5. Background processes: Increased activity from background processes like backups or maintenance tasks could contribute to more frequent log flushes.

To address this issue:

  1. Review your Aurora cluster's storage and IOPS usage to ensure you're not hitting limits.

  2. Check if there have been any unintended configuration changes, particularly those related to the InnoDB engine and logging.

  3. Monitor for any increase in write operations or overall database load that might not be immediately apparent from the application side.

  4. Consider adjusting the instance type to one that better matches your I/O pattern, even if it's the same size as before.

  5. Examine any scheduled tasks or background processes that might have changed or increased in frequency.

  6. Use Performance Insights to identify specific queries or sessions that are contributing most to this wait event.

If the issue persists, you may need to engage AWS support for a more in-depth analysis of your specific Aurora cluster configuration and workload patterns.

Sources
Aurora MySQL wait events - Amazon Aurora
io/aurora_redo_log_flush - Amazon Aurora
io/redo_log_flush - Amazon Aurora
Analyzing DB load by wait events - Amazon Relational Database Service
Analyzing DB load by wait events - Amazon Aurora

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.