- Newest
- Most votes
- Most comments
Hi Phil.
What's the recommended pattern for implementing self-healing components that can recover from core service connectivity issues?
Can you please elaborate on why your components get in a state that requires them to be restarted? What are they trying to do?
DNS is just another reason that your device could go offline. Ideally your components would expect loss of connectivity, regardless of the cause, and should not enter an unrecoverable state.
In AWS IoT Greengrass v2, there are several approaches to handle component restarts when MQTT connectivity resumes after DNS failures:
Restarting Components
- Using the Greengrass CLI: You can create a watchdog component that uses the Greengrass CLI to restart specific components when issues are detected:
sudo /greengrass/v2/bin/greengrass-cli component restart --names "YourComponentName"
This is particularly useful during development and testing.
-
IPC Service for Component Management: For a programmatic approach, use the Greengrass CLI IPC service (available in Greengrass nucleus v2.6.0+) to restart components. Your watchdog component can call the RestartComponent operation, which returns a status of either SUCCEEDED or FAILED.
-
Configuration-based Restart Triggers: Some components support configuration parameters specifically for triggering restarts. For example, the EMQX MQTT broker component has a
restartIdentifierconfiguration option that, when changed, forces the broker to restart.
Self-healing Component Patterns
To implement a self-healing watchdog component:
- Create a monitoring component that:
- Periodically checks MQTT connectivity status
- Logs DNS resolution attempts
- Detects when connections are stuck or failing
- Takes corrective action by restarting affected components
- Use lifecycle configuration:
- The
bootstraplifecycle can be used for one-time setup operations - The
runlifecycle can implement your monitoring logic - Use environment variables in the
Setenvconfiguration to force component restarts when needed
- Force component restarts by:
- Updating component versions (even with no other changes)
- Resetting component configurations to default settings
- Changing environment variables in the component's configuration
For components that terminate unexpectedly, note that Greengrass v2 doesn't automatically restart them like Greengrass v1 did. You may need to implement a custom solution where a monitoring component detects termination and triggers a restart through the CLI or IPC service.
By implementing these patterns, you can create a robust self-healing system that automatically recovers from DNS and MQTT connectivity issues without manual intervention.
Sources
How do I redeploy the same deployment config in Greengrass v2 | AWS re:Post
Manage local deployments and components - AWS IoT Greengrass
IOT Greengrass component executing on every restart. | AWS re:Post
MQTT 5 broker (EMQX) - AWS IoT Greengrass
Relevant content
- asked 3 years ago
- asked 3 years ago
- AWS OFFICIALUpdated 5 months ago
