Architectural Guidance and Optimization Insights for High-Throughput IoT workload on AWS with SQS and Kinesis

0

I'm currently designing an IoT system on AWS to manage a substantial daily message volume of around 25 million. The services I intend to use involves:

  1. AWS IoT Core: Ingesting data from IoT devices.
  2. Rules Engine: Processing data through predefined rules.
  3. AWS Kinesis: Directing processed data to a Lambda function.
  4. Lambda: Enqueuing data in an SQS queue.
  5. Node Consumers: Retrieving messages from SQS for persistence in MongoDB.
  6. Firehose to S3: Utilizing Kinesis Firehose for long-term storage in S3.

To provide context, the SQS queue serves to decouple the ingestion and processing layers in the architecture. I'm curious to understand if there's a genuine need for SQS or if the mentioned services are robust enough without it.

Specific Questions:

  1. Optimization Suggestions: Given the role of SQS in decoupling, are there optimizations or adjustments to enhance the efficiency, particularly in the interaction between Kinesis, Lambda, and SQS?

  2. Scalability Considerations: How well does this architecture scale with a daily message volume of 25 million, and are there recommended practices or AWS service limits to consider, especially with SQS acting as a decoupling mechanism?

  3. Reliability of SQS: Given its role in decoupling, how reliable is SQS for ensuring data persistence, and are there any reliability-related best practices to follow?

  4. Need for SQS: Considering the mentioned services, is there a genuine need for SQS, or are AWS IoT Core, Kinesis, and Lambda sufficient to handle decoupling without introducing an additional service?

  5. Alternative Architectures: If SQS is found to be unnecessary, are there alternative AWS services or architectural patterns that could better accommodate the high message volume while maintaining reliability and efficiency?

Your expertise and advice on the need for SQS and optimizing this architecture for high message volume, with a focus on the decoupling role, would be greatly appreciated. Thank you!

2 Answers
1

Thank you for sharing the details of your IoT architecture. Here are my thoughts on optimizing it for high message volume while maintaining reliability:

Optimization Suggestions:

  • Enable enhanced fan-out on Kinesis to parallelize processing by Lambda. This spreads load across shards.
  • Optimize Lambda function to avoid throttling. Allocate sufficient memory/concurrency.
  • Use multiple smaller SQS queues with Lambda consumers instead of one large queue for better scalability.

Scalability Considerations:

  • The architecture can scale to handle 25 million messages daily. Monitor throughput on each service.
  • Watch for bottlenecks at Lambda and SQS. Scale out Lambda functions and use SQS auto-scaling.
  • Kinesis shards and Lambda concurrency may need to increase based on load.

Reliability of SQS:

  • SQS provides high reliability for decoupled processing. Messages are persisted redundantly.
  • Configure visibility timeout and dead letter queue to handle failed processing.

Need for SQS:

  • SQS adds value by decoupling ingestion from processing. This prevents data loss if consumers go down.
  • However, Kinesis analytics and destinations may also provide decoupling. SQS may be redundant.

Alternatives:

  • Consider using Kinesis Data Firehose for delivery to S3 then analytics/destinations for processing.
  • Or Kafka streams could provide distributed processing without need for SQS queue.

I don't have more details, but take a look in this article: 7 patterns for IoT data ingestion and visualization- How to decide what works best for your use case. Your use case sounds similar to Pattern 5, but I leave to you to decide.

Some related resources:

profile pictureAWS
answered 5 months ago
1

Hi,

This very recent post will answer many of your questions above re perf., throughput, etc: https://aws.amazon.com/blogs/compute/introducing-faster-polling-scale-up-for-aws-lambda-functions-configured-with-amazon-sqs/

And this pair of blogs will provide answers to other questions re. reliability, security, etc. via guidance from Well-Architected Framework: https://aws.amazon.com/blogs/compute/implementing-aws-well-architected-best-practices-for-amazon-sqs-part-2/

Best,

Didier

profile pictureAWS
EXPERT
answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions