Hi,
We are trying to debug a problem with our RDS SQL Server instance, in which the processes blocked spiked for about 10 minutes, and then went back to normal. During the spike the database became unresponsive and affected our production apps.
The main problem we face is that we cannot locate the root cause of the spike.
What we have done so far:
- Checked the logs. See below for the relevant lines.
- Manually (by matching timestamps) reviewed the queries that were running at the time. As far as we could tell, the queries are the same as those that run everyday without issue, and none of them seemed to be CPU or memory straining. Ofc, it's possible our manual review missed something.
Main question: Is there any other place we can look for clues as to what caused the spike?
Additional info:
Screenshots from RDS monitoring:
Snippet from logs:
2023-06-19 22:27:00.44 Server A fatal error occurred while reading the input stream from the network. The session will be terminated (input error: 10054, output error: 0).
2023-06-19 23:14:01.76 spid53 I/O is frozen on database model. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2023-06-19 23:14:01.76 spid74 I/O is frozen on database msdb. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2023-06-19 23:14:01.76 spid85 I/O is frozen on database rdsadmin. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2023-06-19 23:14:01.76 spid116 I/O is frozen on database Moocho. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2023-06-19 23:14:01.76 spid86 I/O is frozen on database m@ster. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2023-06-19 23:14:02.16 spid74 I/O was resumed on database msdb. No user action is required.
2023-06-19 23:14:02.16 spid116 I/O was resumed on database Moocho. No user action is required.
2023-06-19 23:14:02.16 spid86 I/O was resumed on database m@ster. No user action is required.
2023-06-19 23:14:02.16 spid85 I/O was resumed on database rdsadmin. No user action is required.
2023-06-19 23:14:02.16 spid53 I/O was resumed on database model. No user action is required.
2023-06-20 00:00:27.97 spid23s This instance of SQL Server has been using a process ID of 4340 since 4/3/2023 1:21:25 AM (local) 4/3/2023 5:21:25 AM (UTC). This is an informational message only; no user action is required.
hi there sure, SQL Server Web Edition Instance class: db.r5.4xlarge Engine version: 12.00.6293.0.v1
ty