How can I troubleshoot local storage issues in Aurora PostgreSQL-Compatible instances?

6 minute read
1

I want to learn how to troubleshoot and resolve local storage issues caused by temporary tables, log files, and other factors in my Amazon Aurora PostgreSQL-Compatible database instances.

Short description

DB instances that are in Amazon Aurora clusters have two types of storage:

  • Storage used for persistent data (shared cluster volume). For more information, see What the cluster volume contains.
  • Local storage for each Aurora instance in the cluster, based on the instance class. This storage size is bound to the instance class and can be changed only if you move to a larger DB instance class. To store error logs and temporary files, Aurora PostgreSQL-Compatible uses local storage. For non-T class instance types, logs are written onto a dedicated volume that is cleaned automatically. For more information, see Temporary storage limits for Aurora PostgreSQL.

Resolution

To monitor the local storage space that's associated with the Aurora DB instance or node, use the Amazon CloudWatch metric FreeLocalStorage. This metric reports the amount of storage available to each DB instance for temporary tables. For more information, see Monitoring Amazon Aurora metrics with Amazon CloudWatch.

If your Aurora local storage is full, then use the following resolution that best fits your use case.

Local storage space is used by temporary tables or files

When temporary storage is exhausted on the DB instance, this error occurs: "ERROR: could not write block XXXXXXXX of temporary file: No space left on device." This error can have several causes, including operations such as the following:

  • Alter large tables
  • Add indexes on large tables
  • Perform large SELECT queries with complex JOINs, GROUP BY, or ORDER BY clauses

Check temporary file size

Turn on the log_temp_files parameter on the Aurora PostgreSQL-Compatible DB instance. This parameter logs the use of temporary files that are larger than the number of specified kilobytes. After this parameter is turned on, a log entry is made for each temporary file when the file is deleted. A value of 0 logs all temporary file information. A positive value logs only the files that are larger than or equal to the specified number of kilobytes. The default value is -1, which turns off temporary file logging.

To identify the temporary file details, first use the log_temp_files parameter. Then, relate the temporary files with the FreeLocalStorage metric.

Note: When you turn on the log_temp_files parameter, it can cause excessive logging on the Aurora PostgreSQL-Compatible DB instance. It's a best practice to check the size of the Aurora PostgreSQL-Compatible log files before you turn on log_temp_files. If log files consume the maximum space for the local storage, then reduce the value of rds.log_retention to reclaim space. The default value for rds.log_retention is 3 days.

To review temporary files, use the delta of subsequent runs of the following command:

maxiops=> select datname, temp_files , pg_size_pretty(temp_bytes) as temp_file_size  FROM   pg_stat_database order by temp_bytes desc;

In the temp_files column, all temporary files are counted regardless of when you created the temporary file. The columns temp_files and temp_bytes in view pg_stat_database collect statistics for the accumulated value. To reset the value, use the pg_stat_reset() function or restart the DB instance. For more information, see the PostgreSQL documentation for Additional statistics functions on the PostgreSQL website.

If you use Aurora PostgreSQL-Compatible 10 or newer, then you can monitor temp_bytes and temp_files with Performance Insights. You can also use Performance Insights to monitor Amazon Relational Database Service (Amazon RDS) for PostgreSQL. Performance Insights provide native counters for your DB engine's internal metrics, in addition to wait events. For more information, see Native counters for Amazon RDS for PostgreSQL.

To allocate more memory to the processes that perform the operation, you can increase maintenance_work_mem and work_mem. This uses more memory for the operation, and that can use less temporary disk storage. For more information about these parameters, see the PostgreSQL documentation for maintenance_work_mem and work_mem on the PostgreSQL website.

It's a best practice to set the values for maintenance_work_mem and work_mem at a query or session level to avoid running out of memory. For more information, see Amazon Aurora PostgreSQL reference.

Check temporary tables

Run a query similar to the following:

maxiops=> SELECTn.nspname as SchemaName
,c.relname as RelationName
,CASE c.relkind
WHEN 'r' THEN 'table'
WHEN 'v' THEN 'view'
WHEN 'i' THEN 'index'
WHEN 'S' THEN 'sequence'
WHEN 's' THEN 'special'
END as RelationType
,pg_catalog.pg_get_userbyid(c.relowner) as RelationOwner
,pg_size_pretty(pg_relation_size(n.nspname ||'.'|| c.relname)) as RelationSize
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n
    ON n.oid = c.relnamespace
WHERE  c.relkind IN ('r','s')
AND  (n.nspname !~ '^pg_toast' and nspname like 'pg_temp%')
ORDER BY pg_relation_size(n.nspname ||'.'|| c.relname) DESC;

It's a best practice to closely monitor your application and see the transactions that create temporary tables. Then, you can manage the usage of the available local storage capacity. You can also move to a higher instance class for your Aurora instance so that the instance has more available local storage.

Local storage used by log files (applicable only for burstable-performance instance class types)

Excessive logging can also cause your DB instance (only burstable-performance instance class types such as db.t2, db.t3, and db.t4g) to run out of local storage. The following are examples of logging parameters that can consume the local storage space. The consumption could be due to either excessive logging or retaining the error log for a long time:

rds.log_retention_periodauto_explain.log_min_duration
log_connections
log_disconnections
log_lock_waits
log_min_duration_statement
log_statement
log_statement_stats

To identify the parameter that causes excessive logging, analyze the PostgreSQL logs to find the largest logs. Then, identify the parameter that is responsible for the majority of the entries in those logs. You can then modify the parameter that causes the excessive logging.

If you repeatedly run a query that fails with an error, then PostgreSQL logs the errors to the PostgreSQL error log by default. Review the errors logged, and then fix the failing query to prevent logs from using excessive storage. You can also reduce the default value for rds.log_retention (3 days) to reclaim space used by the error logs.

If excessive logging is required and you throttle with available local storage because of log files, then move to a higher instance class. This means that your Aurora DB instance has more available local storage.

Related information

Best practices with Aurora PostgreSQL

AWS OFFICIAL
AWS OFFICIALUpdated 7 months ago