SFN: How to interrupt an execution with an external signal

1

I'm designing a state machine for a contact center. One specific flow has two possible paths of resolution, as soon as one is done, it is not needed (nor wanted) to wait for the other path to complete. For the sake of simplicity, the following example follows the same pattern:

  1. A potential client creates a new user on the platform.
  2. The platform sends a welcoming email, asking the client to complete the sign-in process by logging in and providing their billing information.
  3. Simultaneously, an internal ticket for contacting this user is created (SQS).
  4. Customer support associates will actively poll for tickets, calling the given user to ask for their billing information. The expected behavior: When either the client or the associate provides the missing information, the process is completed, and the state machine should mark the execution as complete. When a client actively logs in and fills out their information,** a trigger is sent to the state machine**, removing the ticket from the queue and marking the execution as complete.

I haven't found a similar use case in the documentation. The closest one is the manual approval process, but in this case, the approval is "optional" (?). For the same reason, I've tried 4 approaches with no luck:

A) Straight flow, asynchronous call to StartExecution, synchronous call to SendMessage (wait for callback). Problem: Missing a way to interrupt if the email sent meant the client actively logged in and finalized the process. Straight flow

B) Parallel execution, calling StartExecution and SendMessage asynchronously. If either task finished, the parallel execution is considered finished and marked as complete. Problem: parallel tasks require all branches to finish in order to be marked as finished. Img shows the right side still waiting, before a manual stop of the execution. Parallel flow

C) Fail states to signal successful execution, interrupting the parallel task. Found this option on stackoverflow (https://stackoverflow.com/a/75945146/19666650). Problem: after implementing it, I found out that the output of the parallel state was the error itself, and I couldn't pass the output of either the left or right branch. Using fail states

D) Cross signaling using DeleteMessage/TaskSucess between branches. I stored ReceiptHandle and TaskToken on a temp table, allowing each branch to "mark as done" its sibling branch. For example, if the "left" branch finished, it would remove the SQS message and send SendTaskSuccess to the "right" branch. This way, both branches would be considered done, and parallel state would have both outputs. Problem: Proxying the queues in order to store the ReceiptHandle creates data races on parallel lambdas. Cross signaling

I've tried every trick I could think of, but I'm running out of options. I'm not even sure if there's a better AWS service for what I need, like EventBridge. Maybe I should explore modeling the flow in EventBridge instead of Step Functions. I'm fairly new to the AWS environment, If you have any suggestions or ideas, I'd love to hear them!

Thanks for taking the time to read this.

2 Answers
1

Once you sent the message to the queue, you can't expect to be able and delete it from within the state machine. There is some other process that is polling the queue and it will get the message and handle it. You can potentially use the receipt handle to delete it, but only if it is currently being processed, and in this case you can get to race conditions and the process that is currently handling the message will probably finish the processing anyway.

I would probably go with the last option, i.e., the parallel approach that one branch terminates the other, with some safeguards so that you store some state in DDB that you can use it to know if to continue or stop execetuions.

profile pictureAWS
EXPERT
Uri
answered 10 months ago
  • Thanks for your input! An AWS Architect confirmed to me that there are no means of interrupting a SFN. I'll be implementing a similar solution to what you've described. Once I've confirmed it works, I'll be sharing my findings here.

  • We just released a new blog post that can help you in your use case. The idea is to use a parallel state, each branch doing one of the operations. When each one of them finishes, you raise an exception which then you can check after the parallel state.

0
Accepted Answer

The blog post provided by Uri in the second comment of the previous answer is exactly what I was looking for. I'm reposting it as an answer so that I can tag this question as solved.

Here is the link: https://aws.amazon.com/blogs/compute/implementing-patterns-that-exit-early-out-of-a-parallel-state-in-aws-step-functions/

answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions