Restarting of RDS possibly interrupts connection with ERP application

0

Hello, I (my organization) have AWS account where RDS Postgres instance is running. This instance has several databases. These databases are used by ERP application which hosted in EC2 ubuntu server. So application is running in EC2 and uses RDS Postgres databases. Application is developed by external IT company.

So the problem is that lately users complained that they receive errors, that application cannot reach database, application losing connection with database. The strange thing is that there are 5 databases in RDS, but connection is lost for example with 2 of them. So we need to call that IT company, they say, that they restart the application, and then connection is restored and works as intended.

Programmers of that IT company claimed that this connection interruption happens because of restarting/reloading of RDS instance. I tried to monitor the work of instance and cannot see any visual RDS shutdown and restart cases. Then i tried to catch such cases and check the logs/events section and each time users face that issue i see events list with events: DB instance shutdown, Backing up DB instance, DB instance shutdown, Database instance is in a state that cannot be upgraded: Postgres cluster is in a state where pg_upgrade can not be completed successfully , Finished DB Instance backup DB instance restarted Backing up DB instance Finished DB Instance backup

Normally, the are only two event shown in list

Backing up DB instance Finished DB Instance backup

Such cases happen once a week/two weeks.

So each time users see error related to database connection lost i find that long list of events. I guess that these events are related to backup or maintenance window, which happens with RDS instance. When RDS restarts application is loosing connection to data., after application restarts connection is restored.

Don't have much programming experience, but i think that application maybe could have some retrying logic, that if connection lost, it could be retried to connect again. Don't know.

So, the question is could maintenance /backup processes related to connection loosing with application and could it be somehow managed/fixed in AWS environment to avoid restarting/reloading RDS server. Are there any options to not performing maintenance or backing up the data and what risk appears behind it?

RDS is running is single availability zone, maybe adding Multi-AZ would solve this issue?

Thank you in advance.

已提问 7 个月前248 查看次数
1 回答
0
已接受的回答

Assuming root cause is due to DB maintenance, you can adjust DB instance maintenance timings so that it happens during off-peak hours. Just be aware that the hours are in UTC time.

Multi-AZ deployment will help, as impact of maintenance event is reduced as there is fail over between the instances.

Perhaps try adjusting maintenance window first before exploring multi-AZ. There is also RDS Proxy if you need to further reduce failover times

AWS
专家
Mike_L
已回答 7 个月前
profile pictureAWS
专家
已审核 7 个月前
  • Hello, Mike

    Thank you so much for your answer.

    Maintenance window is set to 02:25 - 02:55 UTC+3. Night time - for sure it is off-peak hours. So i guess it is not related to it. I still think that it could be fixed by programmers with the code of application and wise reconnection logic. But I can't influence them, unfortunately. Will analyze RDS proxy possibility, as this is new for me. Multi-AZ for now is the last option.

    Thank you again for your answer and if you have any other ideas, please let me know.

    Good luck.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则