- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
As correctly mentioned by you, the amount of temporary storage is linked to the Aurora PostgreSQL DB instance class and the maximum temporary storage corresponding to the Aurora PostgreSQL DB instance classes are stated in the document below-
[+] Temporary storage limits for Aurora PostgreSQL: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Managing.html#AuroraPostgreSQL.Managing.TempStorage
Furthermore, any temp operations including Indexing in PostgreSQL, needing more than this allocated space for temp operation will fail with an error similar to “SQL Error [xxxx]: ERROR: could not write block xxxx of temporary file: No space left on device”.
The suggested workaround for the same is to plan to temporarily increase the DB instance type from current to a higher DB instance class, having local storage space available for temp operation depending on the amount of temp space needed for index creation activity. Later you can downscale, if this is a one time activity.
In case this is an ongoing activity, then you will need to plan for higher instance type in longer run. For this purpose, please refer to the instance type wise local storage allocation specified in the above document for temporary storage limits.
However, I understand that you are concerned about over-provisioning the resources and as a result over-spending on the same.
Having said that, I would recommend you to opt for Amazon Aurora Serverless V2 cluster, which is an on-demand, autoscaling configuration for Amazon Aurora. As mentioned in the our official document for Amazon Aurora, Aurora Serverless v2 helps to automate the processes of monitoring the workload and adjusting the capacity for your databases, where the capacity is adjusted automatically based on application demand.
Therefore, you are charged only for the resources that your DB clusters consume. This would help you to stay within budget and avoid paying for computer resources that you don't use.
Please refer to the following document for further information on the same,
[+] Using Aurora Serverless v2: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html
I would strongly recommend you to test your database and workload against Amazon Serverless V2 cluster, in order to understand the configurations suitable for your environment.
Additionally, please consider specifying an appropriate value for the ‘work_mem’ DB parameter. As you might be aware, it sets the base maximum amount of memory to be used by a query operation before writing to temporary disk files. Kindly review the following document for more details-
[+] Local storage space is used by temporary tables or files: https://aws.amazon.com/premiumsupport/knowledge-center/postgresql-aurora-storage-issue/#Local_storage_space_is_used_by_temporary_tables_or_files
This is in response to your follow-up query:
As you might already be aware, there could be several instances where you would require reindexing your database tables and I have stated these below-
- If an index becomes corrupted.
- If the index consists of many empty or nearly-empty pages and as a result becomes bloated.
- If you configure a storage parameter and want to ensure that the changes are in effect.
- If an index build fails.
In all of the above scenarios, the recommended solution is to analyze the situation, provision appropriate resources and run REINDEX accordingly.
This has been explained in-depth on the official PostgreSQL documentation below-
[+] REINDEX: https://www.postgresql.org/docs/current/sql-reindex.html
Upon further research, I found that with GIN index, for avoiding long running bulk insertions, you may be required to drop the GIN index first and then perform reindexing as per the PostgreSQL documentation,
[+] GIN Tips and Tricks: https://www.postgresql.org/docs/current/gin-tips.html
In a nutshell, aforementioned are the known scenarios when reindexing maybe required. The occurrence of such situations is dependent on your database and database workload. Hence, please consider testing GIN indexes in your staging environment prior to implementing the same in your production database.
Furthermore, it is suggested to involve your in-house DBA to assist on rebuilding the indexes in such cases, as this falls out of scope of the AWS support. I would like to mention that due to AWS Data Security Policy and Shared Responsibility model, we don't have access or visibility into customer's databases, data, code and workload or queries, therefore, we mainly support RDS at its service and infrastructure level. For further information on this, kindly refer to the following links-
[+] AWS Support scope - https://aws.amazon.com/compliance/services-in-scope/
[+] Shared Responsibility Model: https://aws.amazon.com/compliance/shared-responsibility-model/
Regarding your query related to automated tasks rebuilding an index, unless programmed explicitly for rebuilding an index, there should not be any reason for it to trigger REINDEX.
Contenuto pertinente
- AWS UFFICIALEAggiornata un anno fa
- AWS UFFICIALEAggiornata 4 mesi fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 3 anni fa
Thanks for confirming that. Just to follow up on my last question:
"Also if I scaled the servers up, then scaled them down after the GIN indexing completes, would this be risky, is there any reason why the GIN index would have to completely rebuild after the initial build."
Would you say it's a risk to be running in production with instance sizes that are not capable of reproducing the current state/functionality of the database. The second part of the question "any reason", please read that as 'any automated task' that would run to try rebuild the index automatically ( and subsequently fail due to temp space issue ) as opposed to a maintenance task specifically trigger by an operator/engineer.
Thanks