Rollback docker component with AWS Greengrass V2: follow up question #2

0

Sorry for keeping to post follow-up questions like this but I feel like comments inside a thread are not really seen/answered. So I have been trying to correctly implement Rollback to my docker component I am deploying through Greengrass on a fleet of RPi's. So, according to the answers so far in this question and this question, I need to use Startup and Shutdown in the Lifecycle command for Rollbacks to work. Moreover, I've been advised to use Greengrass IPC inside my Python script inside the deployed docker image to repord a RUNNING state. I read in the developer guide that I need to add several mappings to use IPC inside the docker container and install awsiot through pip inside the image. Next I've been advised to head to https://docs.aws.amazon.com/greengrass/v2/developerguide/ipc-component-lifecycle.html to check out how to interact with component lifecycle through IPC. It's explained quite minimalistically and I am struggling to find some real Python examples. However, I realize that I need to add aws.greengrass#PauseComponent and aws.greengrass#ResumeComponent to the "operations" section of my component recipe. So my component recipe looks like this, can you tell me if that is correct (expecially how I added the policy in the end... I really have no idea)?

{
  "RecipeFormatVersion": "2020-01-25",
  "ComponentName": "com.example.MyPrivateDockerComponent",
  "ComponentVersion": "2.0.0",
  "ComponentType": "aws.greengrass.generic",
  "ComponentDescription": "A component that runs a Docker container from a private Amazon ECR image.",
  "ComponentPublisher": "Amazon",
  "ComponentDependencies": {
    "aws.greengrass.DockerApplicationManager": {
      "VersionRequirement": ">=2.0.0 <2.1.0",
      "DependencyType": "HARD"
    },
    "aws.greengrass.TokenExchangeService": {
      "VersionRequirement": ">=2.0.0 <2.1.0",
      "DependencyType": "HARD"
    }
  },
  "Manifests": [
    {
      "Platform": {
        "os": "all"
      },
      "Lifecycle": {
        "Startup": "docker run --privileged -v $AWS_GG_NUCLEUS_DOMAIN_SOCKET_FILEPATH_FOR_COMPONENT:$AWS_GG_NUCLEUS_DOMAIN_SOCKET_FILEPATH_FOR_COMPONENT -e SVCUID -e AWS_GG_NUCLEUS_DOMAIN_SOCKET_FILEPATH_FOR_COMPONENT -e SVCUID -e AWS_GG_NUCLEUS_DOMAIN_SOCKET_FILEPATH_FOR_COMPONENT --rm 242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:0.0.1",
        "Shutdown": "docker stop $(docker ps -a -q --filter ancestor=242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:0.0.1)"
      },
      "Artifacts": [
        {
          "Uri": "docker:242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:fileerror",
          "Unarchive": "NONE",
          "Permission": {
            "Read": "OWNER",
            "Execute": "NONE"
          }
        }
      ]
    }
  ],
  "Lifecycle": {}

  "accessControl": {
    "aws.greengrass.ipc.lifecycle": {
      "com.example.MyPrivateDockerComponent:lifecycle:1": {
        "policyDescription": "Allows access to pause/resume all components.",
        "operations": [
          "aws.greengrass#PauseComponent",
          "aws.greengrass#ResumeComponent"
        ],
        "resources": [
          "*"
        ]
      }
    }
  }
}

Once the component is correct, I need to implement status reporting in the python script that I run on my docker. Let's assume I am running a simple script that keeps printing to the console looks as follows. Where and how do I report a RUNNING/ERRORED state through UpdateState through the awsiot SDK? could you fill in the missing code?

import time

if __name__ == "__main__":
    while True:
        print("Hello World!")
        time.sleep(5)

Thank you for your help.

  • I think all the advice you got so far was correct, but maybe not collectively all pulling in the same direction. A question however: how long after installation do you still want a failure to cause a deployment rollback? In my opinion, rollback is the right behaviour if the installation fails or the Docker container fails to start. But if the Docker container fails N seconds/minutes/hours after successfully starting, then the UNHEALTHY state you had in your original question is probably the desired behaviour.

  • yes, you're right. I just want to trigger Rollback if the installation fails/the docker container or the script fails to start! So what would be the best way to go about this? I am really just deploying a docker container with some dependencies and running a python script therein. I want the deployment to rollback if a new deployment fails to get going. if the script stops at some time t after deployment, I am happy if the device state moves to UNHEALTHY. @Greg_B

1 Answer
0

Hello,

The deployment is considered as successful when all the components that we check are in their desired state (running/finished). If you are using Run script, Greengrass will not wait for the script, but if the script exits with an error before Greengrass reports back the deployment status - the core will rollback (assuming you have a rollback Config). If you want to perform checks before confirming that the component is healthy, use Startup. With Startup, Greengrass waits until the script returns, and will therefore capture any failure or timeouts. But with startup your code will need to run in background (startup scripts must exit) and use UpdateState to notify failures after it has started up.

You can find the AWS IoT Device SDK for Python v2 UpdateState Operation [here] (https://aws.github.io/aws-iot-device-sdk-python-v2/awsiot/greengrasscoreipc.html#awsiot.greengrasscoreipc.client.UpdateStateOperation)

Unfortunately, we do not have any Example code for the implementation of this lifecycle IPC service to update the component state on the core device. And you do not need to add any Authorization policy statement for this feature.

Please note that the PauseComponent and ResumeComponent feature will Pause/Resume a component's processes on the core device, if you want to make use of this functionality, only then you should be adding the Authorization policy statement. Addtionally, this operation can't pause containerized processes, such as Docker containers. To pause and resume a Docker container, you can use the "docker pause" and "docker resume" commands.

AWS
SUPPORT ENGINEER
Rahul_P
answered a year ago
  • thank you. what do you mean by a rollback config? Do you mean this inside the deployment json:

        "deploymentPolicies": {
            "failureHandlingPolicy": "ROLLBACK",
            "componentUpdatePolicy": {
                "timeoutInSeconds": 60,
                "action": "NOTIFY_COMPONENTS"
            }
        },
    

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions