Restarting of RDS possibly interrupts connection with ERP application

0

Hello, I (my organization) have AWS account where RDS Postgres instance is running. This instance has several databases. These databases are used by ERP application which hosted in EC2 ubuntu server. So application is running in EC2 and uses RDS Postgres databases. Application is developed by external IT company.

So the problem is that lately users complained that they receive errors, that application cannot reach database, application losing connection with database. The strange thing is that there are 5 databases in RDS, but connection is lost for example with 2 of them. So we need to call that IT company, they say, that they restart the application, and then connection is restored and works as intended.

Programmers of that IT company claimed that this connection interruption happens because of restarting/reloading of RDS instance. I tried to monitor the work of instance and cannot see any visual RDS shutdown and restart cases. Then i tried to catch such cases and check the logs/events section and each time users face that issue i see events list with events: DB instance shutdown, Backing up DB instance, DB instance shutdown, Database instance is in a state that cannot be upgraded: Postgres cluster is in a state where pg_upgrade can not be completed successfully , Finished DB Instance backup DB instance restarted Backing up DB instance Finished DB Instance backup

Normally, the are only two event shown in list

Backing up DB instance Finished DB Instance backup

Such cases happen once a week/two weeks.

So each time users see error related to database connection lost i find that long list of events. I guess that these events are related to backup or maintenance window, which happens with RDS instance. When RDS restarts application is loosing connection to data., after application restarts connection is restored.

Don't have much programming experience, but i think that application maybe could have some retrying logic, that if connection lost, it could be retried to connect again. Don't know.

So, the question is could maintenance /backup processes related to connection loosing with application and could it be somehow managed/fixed in AWS environment to avoid restarting/reloading RDS server. Are there any options to not performing maintenance or backing up the data and what risk appears behind it?

RDS is running is single availability zone, maybe adding Multi-AZ would solve this issue?

Thank you in advance.

1 Answer
0
Accepted Answer

Assuming root cause is due to DB maintenance, you can adjust DB instance maintenance timings so that it happens during off-peak hours. Just be aware that the hours are in UTC time.

Multi-AZ deployment will help, as impact of maintenance event is reduced as there is fail over between the instances.

Perhaps try adjusting maintenance window first before exploring multi-AZ. There is also RDS Proxy if you need to further reduce failover times

AWS
EXPERT
Mike_L
answered 7 months ago
profile pictureAWS
EXPERT
reviewed 7 months ago
  • Hello, Mike

    Thank you so much for your answer.

    Maintenance window is set to 02:25 - 02:55 UTC+3. Night time - for sure it is off-peak hours. So i guess it is not related to it. I still think that it could be fixed by programmers with the code of application and wise reconnection logic. But I can't influence them, unfortunately. Will analyze RDS proxy possibility, as this is new for me. Multi-AZ for now is the last option.

    Thank you again for your answer and if you have any other ideas, please let me know.

    Good luck.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions