How does Greengrass V2 Component Recover Step ERRORED - RUNNING Transition Work?

Question

As per [this](https://docs.aws.amazon.com/greengrass/v2/developerguide/component-recipe-reference.html) documentation, in the `run` lifecycle step, I can have a long-running script and the component will remain in `RUNNING` state as long as the script runs. If the script exits with an error code, the component will enter into `ERRORED` state. I am trying to understand how to write the script in the `recover` step to get the component back to `RUNNING`. The documentation says that the recover script times out after 60s by default. Does the component go back into `RUNNING` state if the recover script exits with a success code before it times out? But then how does Greengrass know what process to monitor for error/finish events in the future?

I have been experimenting with a dummy component to test out the behavior but I don't really understand the behavior completely. What I tried:

In the `run` step, I have a long running python script `python3 -u my_script.py`. It starts some server and listen on a host port. Then in the `recover` step, I set it up so that if nothing is listening on the port, it just restarts the script. Then I kill the process in the `run` step to trigger a recover.

What I noticed is that the component remains in `ERRORED` state for 60 seconds and then goes into `RUNNING` state. The timeout doesn't stop my python script though, it keeps running, and Greengrass somehow knows to monitor this new process created by the recover step (if I kill it, the recover gets triggered again and creates a new process).

What is surprising is even if I start the python script in the background using `python3 -u my_script.py &`, the component will still remain in `ERRORED` state for the full 60 seconds before going back into `RUNNING` state even though the recover script would have finished much earlier. And Greengrass somehow still knows to monitor this new script started by `recover` step for the component state. But then what if I start multiple processes in the background in the `recover` step, what will it monitor?

Accepted Answer

Hi,

Greengrass will retry errored components twice, then the component will enter broken state where the component will stop running. No matter if a `recover` step is defined or not, Greengrass will always retry. The `recover` step was to define on what you want the component to do after it errored and before retry. The default 60s `timeout` in `recover` step is to timeout the recover script which will unblock retry.

How does Greengrass V2 Component Recover Step ERRORED -> RUNNING Transition Work?

Relevant content