I want to follow best practices when I implement AWS Lambda backed custom resources with AWS CloudFormation.
Resolution
When you implement Lambda backed custom resources with CloudFormation, use the following best practices.
Build your custom resources to report, log, and handle failure
Exceptions can cause your function code to exit without sending a response. CloudFormation requires an HTTPS response to confirm whether the operation is a success or a failure. An unreported exception causes CloudFormation to wait until the operation times out before it starts a stack rollback. If the exception recurs during rollback, then CloudFormation waits again for a timeout before it ends in a rollback failure. During this time, you can't use your stack.
To avoid timeout issues, include the following elements in the code that you create for your Lambda function:
- Logic to handle exceptions
- The ability to log failures for troubleshooting scenarios
- The ability to respond to CloudFormation with an HTTPS response to confirm that an operation failed
- A dead-letter queue that lets you capture and deal with incomplete runs
- A cfn-response module to send a response to CloudFormation
Set reasonable timeout periods, and report when they're about to be exceeded
If an operation doesn't run within its defined timeout period, then the function raises an exception and doesn't send a response to CloudFormation.
To avoid this issue, take the following actions:
- Set the timeout value for your Lambda functions high enough to handle variations in processing time and network conditions.
- Set a timer in your function to respond to CloudFormation with an error when a function is about to time out.
Build around Create, Update, and Delete events
Depending on the stack action, CloudFormation sends your function a Create, Update, or Delete event. Because each event is handled differently, make sure that there are no unintended behaviors when your function receives one of the three event types.
For more information, see Custom resource request types.
Understand how CloudFormation identifies and replaces resources
When an update replaces a physical resource, CloudFormation compares the PhysicalResourceId that your Lambda function returns to the previous PhysicalResourceId. If the IDs differ, then CloudFormation assumes that the resource is replaced with a new physical resource.
However, to allow for potential rollbacks, CloudFormation doesn't remove the old resource. When the stack update completes, CloudFormation sends a Delete event request with the old physical ID as an identifier. If the stack update fails and a rollback occurs, then CloudFormation sends the new physical ID in the Delete event.
Use PhysicalResourceId to uniquely identify resources so that when your function receives a Delete event, the function deletes only the correct resources during a replacement.
Design your functions with idempotency
You can repeat an idempotent function numerous times with the same inputs, and the result is the same as doing it only once. Idempotency makes sure that retries, updates, and rollbacks don't create duplicate resources or introduce errors.
For example, CloudFormation invokes your function to create a resource, but doesn't receive a response that the resource is successfully created. CloudFormation might invoke the function again, and create a second resource. The first resource can then become orphaned.
Implement your handlers to correctly handle rollbacks
When a stack operation fails, CloudFormation tries to return all resources to their previous state. This action results in different behaviors depending on whether the update caused a resource replacement.
To make sure that CloudFormation can complete rollbacks, take the following actions:
Related information
Create custom provisioning logic with custom resources