Lightsail server down for couple of hours before "graceful restart"

0

For two weeks in a row, I receive JetPack messages that my website is down, just a couple of hours before a 0:00 PM automatic restart (AH00493: SIGUSR1 received. Doing graceful restart). I am not sure this restart is automatic or related to the outage, but it seems to be one week apart, so it may automatic. The error log provides constant errors every few seconds relating to WP-Fastest Cache (which I am following up on separately) until becoming quiet a few hours before the restart: then it is back online again (without any actions from my end before, during or after the outage):

Latest error in a long, constant list every few seconds:

[Sun Mar 13 22:35:38.295213 2022] [proxy_fcgi:error] [pid 24517:tid xxx] [client xxxx] AH01071: Got error 'PHP message: PHP Warning: include_once(/bitnami/wordpress/wp-content/plugins/wp-super-cache/wp-cache-phase1.php): failed to open stream: No such file or directory in /bitnami/wordpress/wp-content/advanced-cache.php on line 22PHP message: PHP Warning: include_once(): Failed opening '/bitnami/wordpress/wp-content/plugins/wp-super-cache/wp-cache-phase1.php' for inclusion (include_path='.:/opt/bitnami/php/lib/php') in /bitnami/wordpress/wp-content/advanced-cache.php on line 22'

SERVER DOWN?

[Mon Mar 14 00:00:01.635219 2022] [mpm_event:notice] [pid xx:tid xxxx] AH00493: SIGUSR1 received. Doing graceful restart [Mon Mar 14 00:00:01.644021 2022] [mpm_event:notice] [pid xxx:tid xxxx] AH00489: Apache/2.4.52 (Unix) OpenSSL/1.1.1d configured -- resuming normal operations [Mon Mar 14 00:00:01.644034 2022] [core:notice] [pid xxx:tid xxx] AH00094: Command line: '/opt/bitnami/apache/bin/httpd -f /opt/bitnami/apache/conf/httpd.conf'

SERVER BACK UP, errors start again:

[Mon Mar 14 00:00:32.243839 2022] [proxy_fcgi:error] [pid xx:tid xxx] [client xxx:46500] AH01071: Got error 'PHP message: PHP Warning: include_once(/bitnami/wordpress/wp-content/plugins/wp-super-cache/wp-cache-phase1.php): failed to open stream: No such file or directory in /bitnami/wordpress/wp-content/advanced-cache.php on line 22PHP message: PHP Warning: include_once(): Failed opening '/bitnami/wordpress/wp-content/plugins/wp-super-cache/wp-cache-phase1.php' for inclusion (include_path='.:/opt/bitnami/php/lib/php') in /bitnami/wordpress/wp-content/advanced-cache.php on line 22'

The access log become quiet 13 minutes earlier:

192.0.1xxx - - [13/Mar/2022:22:35:57 +0000] "GET /wp-content/uploads/2020/01/IMG_1668.jpg HTTP/1.1" 200 175485

91.17xxx49 - - [13/Mar/2022:22:37:16 +0000] "GET /wp-content/uploads/2021/03/JPG-1-1440x1080.jpg HTTP/1.1" 200 187452

13xxx17.54 - - [13/Mar/2022:22:38:55 +0000] "POST / HTTP/1.1" 301 233

135xxx7.54 - - [13/Mar/2022:22:38:55 +0000] "GET /.env HTTP/1.1" 301 237

SERVER DOWN?

20.xxxx225 - - [14/Mar/2022:00:00:02 +0000] "GET /.env HTTP/1.1" 301 237

35xxx1.102 - - [14/Mar/2022:00:00:02 +0000] "GET / HTTP/1.1" 301 233

216xxx240 - - [14/Mar/2022:00:00:02 +0000] "GET /robots.txt HTTP/1.1" 301 243

The logs for the previous outage (one week prior, but from Sat-Sun instead of from Sun-Mon) are comparable.

Hope that someone can point me to the source of the downtime prior to the 0:00 reboot. PS: i was not able to check the outage as both times I was sleeping during the outage. So I'm basing my info on JetPack informing me the site was down. Please note that JetPack first reports the site down at 23:49 and only back up at 01:01. I'm not sure all the time zones are in sync. Could 01:01 in JetPack be 0:01 for the server and 23:49 be 22:49? That would make more sense...

Kind regards, Paul

The JetPack message: Your site appears to be down .... did not load when Jetpack Monitor last checked on it.

What's happening? Your site is responding intermittently, or extremely slowly. This can indicate an overloaded, under-powered, or misconfigured server. Your site is probably loading for some users, but not for everyone.

Error reference: 134320534/intermittent

What should you do now? Start by visiting your site to see if you're able to load it. Jetpack Monitor may have just recorded a momentary glitch that's since been resolved, and you can ignore this email.

If you're unable to load your site, check your host's control panel or contact their support team: they'll have more detail about what is happening. Be sure to share the error information above with them.

Later 0:54 Your site still appears to be down. conveybeauty.com still did not load when Jetpack Monitor last checked on it. It's been offline for 1 hour.

What is happening? Your site is responding intermittently, or extremely slowly. This can indicate an overloaded, under-powered, or misconfigured server. Your site is probably loading for some users, but not for everyone.

Error reference: 134320534/intermittent

What should you do now? If you haven't visited your site recently, give it a try and see if you're able to load it.

If you're still unable to view it, now would be a great time to get in touch with your host's support team and share the error information above with them.

Paul
asked 2 years ago896 views
2 Answers
2
Accepted Answer
profile picture
answered 2 years ago
  • I will look into load balancers, good suggestion in any case. But I'm still worried about the source of the outage. Why the reboot at 0:00 exactly and why the outage/unresponsiveness just hours before? Some of this may be regular behavior but I'm not aware of it. I just want to exclude any more structural sources.

1

Hello @Paul,

Is there a way for you to see how much traffic you get before your instance "shuts down" (it might not shut down, but become unresponsive)? When an instance uses too many resources, it will consume burst capacity, and once it exhausts that, it might become unresponsive. So, I'm thinking you might need a bigger instance to handle traffic during peak hours. See https://lightsail.aws.amazon.com/ls/docs/en_us/articles/amazon-lightsail-viewing-instance-burst-capacity for further details.

Regards

AWS
answered 2 years ago
  • Good point, I should have mentioned this: I have more than enough resources. I'm on the 20USD/month plan with 1000-1500 users a day which tapers of strongly after 21h. So MAX 10 simultaneous users at the time of outage and I'm anyway constantly very low on resource utilization if I look at Metrics in Lightsail.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions