How does Greengrass V2 Component Recover Step ERRORED -> RUNNING Transition Work?

0

As per this documentation, in the run lifecycle step, I can have a long-running script and the component will remain in RUNNING state as long as the script runs. If the script exits with an error code, the component will enter into ERRORED state. I am trying to understand how to write the script in the recover step to get the component back to RUNNING. The documentation says that the recover script times out after 60s by default. Does the component go back into RUNNING state if the recover script exits with a success code before it times out? But then how does Greengrass know what process to monitor for error/finish events in the future?

I have been experimenting with a dummy component to test out the behavior but I don't really understand the behavior completely. What I tried:

In the run step, I have a long running python script python3 -u my_script.py. It starts some server and listen on a host port. Then in the recover step, I set it up so that if nothing is listening on the port, it just restarts the script. Then I kill the process in the run step to trigger a recover.

What I noticed is that the component remains in ERRORED state for 60 seconds and then goes into RUNNING state. The timeout doesn't stop my python script though, it keeps running, and Greengrass somehow knows to monitor this new process created by the recover step (if I kill it, the recover gets triggered again and creates a new process).

What is surprising is even if I start the python script in the background using python3 -u my_script.py &, the component will still remain in ERRORED state for the full 60 seconds before going back into RUNNING state even though the recover script would have finished much earlier. And Greengrass somehow still knows to monitor this new script started by recover step for the component state. But then what if I start multiple processes in the background in the recover step, what will it monitor?

Sourish
질문됨 2달 전164회 조회
1개 답변
1
수락된 답변

Hi,

Greengrass will retry errored components twice, then the component will enter broken state where the component will stop running. No matter if a recover step is defined or not, Greengrass will always retry. The recover step was to define on what you want the component to do after it errored and before retry. The default 60s timeout in recover step is to timeout the recover script which will unblock retry.

AWS
yitingb
답변함 한 달 전
  • Ah I see, this makes complete sense! So basically the run script is re-run after the recover script finishes executing.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠