Hello
I am running Document DB of engine version 5.00 in production account.
Around 4:57 a.m. (KST) on April 24 (DB cluster maintenance period)
We received a notification that the connection with the Document DB was cut off for about 10 seconds due to our service server notification.
Related application logs.
2024-04-24T04:59:14.252+09:00 WARN 1 --- o.s.b.a.data.mongo.MongoHealthIndicator : MongoDB health check failed
org.springframework.dao.DataAccessResourceFailureException: Prematurely reached end of stream
at
Based on Document DB monitoring indicators, the following phenomena were discovered at the same time.
-
- increased VolumeReadIOPs ( 0 -> 1789)
-
-
-
- decreased IndexBufferCacheHitRatio (100->99.9)
-
- decreased EngineUptime (6.5M => 785)
There was no notification on CloudTrail, AWS Healthcheck Dashboard, or e-mail, so I searched the event history through Document DB API.
{
"SourceIdentifier": "example-documentdb",
"SourceType": "db-cluster",
"Message" : "Database cluster engine version upgrade started.",
"EventCategories": [
"maintenance"
],
"Date": "2024-04-23T19:58: 31. 292000+00: 00",
"SourceArn": "arn:aws:rds: ap-northeast-2:01234567890: cluster:example-documentdb"
}
{
"SourceIdentifier": "example-documentdb2",
"SourceType": "db-instance"
"Message": "DB instance shutdown",
"EventCategories": [
"availability"
],
"Date" : "2024-04-23T19:59:13.484000+00: 00",
"SourceArn": "arn:aws:rds: ap-northeast-2:01234567890:db: example-documentdb2"
}
{
"SourceIdentifier": "example-documentdb3",
"SourceType" : "db-instance"
"Message": "Read replica has been disconnected from master. Restarting database.",
"EventCategories": [],
"Date": "2024-04-23T19:59:20.830000+00:00",
"S ourceArn": "arn:aws:rds: ap-northeast-2:01234567890: db:example-documentdb3"
}
{
"SourceIdentifier": "example-documentdb",
"SourceType" : "db-instance"
"Message": "Read replica has been disconnected from master. Restarting database.",
"EventCategories": [],
"Date" : "2024-04-23T19:59:22.047000+00:00",
"S ourceArn" : "arn: aws: rds: ap-northeast-2:01234567890:db: example-documentdb"
}
{
"SourceIdentifier": "example-documentdb2",
"SourceType" : "db-instance"
"Message": "DB instance restarted",
"EventCategories": [
"availability"
"Date": "2024-04-23T19:59: 25. 460000+00: 00",
"SourceArn": "arn:aws:rds: ap-northeast-2:01234567890:db: example-documentdb2"
}
{
"SourceIdentifier": "example-documentdb3",
"SourceType" : "db-instance"
"Message": "DB instance restarted",
"EventCategories": [
"availability"
"Date": "2024-04-23719:59:33. 382000+00:00"
"SourceArn": "arn:aws:rds: ap-northeast-2:01234567890:db: example-documentdb3"
}
{
"SourceIdentifier": "example-documentdb",
"SourceType": "db-instance"
"Message": "DB instance restarted",
"EventCategories": [
"availability"
"Date": "2024-04-23719:59:33.986000+00:00",
"SourceArn": "arn:aws:rds: ap-northeast-2:01234567890:db: example-documentdb"
}
{
"SourceIdentifier": "example-documentdb",
"SourceType" : "db-cluster",
"Message": "Database cluster engine version has been upgraded.",
"EventCategories": [
"maintenance"
],
"Date": "2024-04-23T20:00:36. 305000+00:00",
"SourceArn" : "arn:aws:rds: ap-northeast-2:01234567890:cluster :example-documentdb"
}
I checked the document DBs of the same engine version in another account
There was a Pending Maintenance called "Bugfix."
When you run a bugfix operation immediately, similar event history and monitoring indicators were recorded
Those DBs were not clustered, they had no downtime.
There was no message saying "Read replica has been disconnected from master. Restarting database"
About the problem
- Whether the "Bugfix" operation is the cause of the application connection failure
- Why didn't I get a notification for a job like "Bug fix"
- What kind of setup is needed
I'm curious about .