How does Greengrass V2 Component Recover Step ERRORED -> RUNNING Transition Work?

0

As per this documentation, in the run lifecycle step, I can have a long-running script and the component will remain in RUNNING state as long as the script runs. If the script exits with an error code, the component will enter into ERRORED state. I am trying to understand how to write the script in the recover step to get the component back to RUNNING. The documentation says that the recover script times out after 60s by default. Does the component go back into RUNNING state if the recover script exits with a success code before it times out? But then how does Greengrass know what process to monitor for error/finish events in the future?

I have been experimenting with a dummy component to test out the behavior but I don't really understand the behavior completely. What I tried:

In the run step, I have a long running python script python3 -u my_script.py. It starts some server and listen on a host port. Then in the recover step, I set it up so that if nothing is listening on the port, it just restarts the script. Then I kill the process in the run step to trigger a recover.

What I noticed is that the component remains in ERRORED state for 60 seconds and then goes into RUNNING state. The timeout doesn't stop my python script though, it keeps running, and Greengrass somehow knows to monitor this new process created by the recover step (if I kill it, the recover gets triggered again and creates a new process).

What is surprising is even if I start the python script in the background using python3 -u my_script.py &, the component will still remain in ERRORED state for the full 60 seconds before going back into RUNNING state even though the recover script would have finished much earlier. And Greengrass somehow still knows to monitor this new script started by recover step for the component state. But then what if I start multiple processes in the background in the recover step, what will it monitor?

Sourish
asked a month ago151 views
1 Answer
1
Accepted Answer

Hi,

Greengrass will retry errored components twice, then the component will enter broken state where the component will stop running. No matter if a recover step is defined or not, Greengrass will always retry. The recover step was to define on what you want the component to do after it errored and before retry. The default 60s timeout in recover step is to timeout the recover script which will unblock retry.

AWS
yitingb
answered a month ago
  • Ah I see, this makes complete sense! So basically the run script is re-run after the recover script finishes executing.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions