使用 AWS re:Post 即表示您同意 AWS re:Post 使用條款

Kinesis Data Stream to Kinesis Data Firehose

1

Hi,

I have seen a lot of examples where data records are sent from KDS to KDF even there is NO real-time processing is required. Why can't we ingest data directly to KDF and store the records in data lake... use Redshift may be for analytics just as an example. Please see the below link for architecture example - I don't understand the need for KDS here

https://aws.amazon.com/blogs/business-intelligence/how-medhosts-cardiac-risk-prediction-successfully-leveraged-aws-analytic-services/

Thank you

已提問 7 個月前檢視次數 1714 次
1 個回答
1
已接受的答案

The use of Amazon Kinesis Data Streams (KDS) in the architecture described in the blog post is likely due to the following reasons:

Real-time Processing: While the blog post does not explicitly mention real-time processing requirements, the use of KDS suggests that there may be a need for low-latency, real-time processing of the data. KDS is designed to handle real-time streaming data and can be integrated with other AWS services like AWS Lambda for event-driven processing.

Decoupling Data Ingestion and Analytics: By using KDS as an intermediary between the data source and the analytics pipeline, the architecture decouples the data ingestion and the data processing/analytics components. This allows for more flexibility and scalability, as the data ingestion and analytics can be scaled independently based on the workload.

Exactly-Once Delivery: KDS provides exactly-once delivery semantics, which ensures that each record is processed exactly once, even in the face of failures or retries. This can be important for certain use cases where data integrity and consistency are critical.

Durability and Scalability: KDS provides durable storage and scalable throughput, which can be important for handling large volumes of streaming data. This can help ensure that the data is not lost and can be processed at the required scale.

Instead of directly ingesting the data into Amazon Kinesis Data Firehose (KDF) or Amazon Redshift, the architecture in the blog post uses KDS as an intermediary layer. This approach can provide the following benefits:

Flexibility: By using KDS, the architecture can easily integrate with other real-time processing or analytics services, such as AWS Lambda, Amazon Kinesis Data Analytics, or Amazon Kinesis Data Firehose.
Scalability: KDS can handle high-throughput, real-time data streams, which may be difficult to achieve with a direct ingestion into KDF or Redshift.
Reliability: KDS provides durability and fault tolerance, ensuring that the data is not lost in the event of failures or other issues.

However, it's important to note that the specific use case and requirements of the application will ultimately determine the most appropriate architecture. The decision to use KDS, KDF, or Redshift (or a combination of these services) should be based on factors such as the volume and velocity of the data, the need for real-time processing, the desired level of data durability and reliability, and the overall cost and operational complexity.

AWS
已回答 7 個月前
profile picture
專家
已審閱 5 個月前
  • Thank you so much for the details. I really appreciate the time you took to elaborate in such a great way.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南