AWS re:Post을(를) 사용하면 다음에 동의하게 됩니다. AWS re:Post 이용 약관

Kinesis Data Stream to Kinesis Data Firehose

1

Hi,

I have seen a lot of examples where data records are sent from KDS to KDF even there is NO real-time processing is required. Why can't we ingest data directly to KDF and store the records in data lake... use Redshift may be for analytics just as an example. Please see the below link for architecture example - I don't understand the need for KDS here

https://aws.amazon.com/blogs/business-intelligence/how-medhosts-cardiac-risk-prediction-successfully-leveraged-aws-analytic-services/

Thank you

1개 답변
1
수락된 답변

The use of Amazon Kinesis Data Streams (KDS) in the architecture described in the blog post is likely due to the following reasons:

Real-time Processing: While the blog post does not explicitly mention real-time processing requirements, the use of KDS suggests that there may be a need for low-latency, real-time processing of the data. KDS is designed to handle real-time streaming data and can be integrated with other AWS services like AWS Lambda for event-driven processing.

Decoupling Data Ingestion and Analytics: By using KDS as an intermediary between the data source and the analytics pipeline, the architecture decouples the data ingestion and the data processing/analytics components. This allows for more flexibility and scalability, as the data ingestion and analytics can be scaled independently based on the workload.

Exactly-Once Delivery: KDS provides exactly-once delivery semantics, which ensures that each record is processed exactly once, even in the face of failures or retries. This can be important for certain use cases where data integrity and consistency are critical.

Durability and Scalability: KDS provides durable storage and scalable throughput, which can be important for handling large volumes of streaming data. This can help ensure that the data is not lost and can be processed at the required scale.

Instead of directly ingesting the data into Amazon Kinesis Data Firehose (KDF) or Amazon Redshift, the architecture in the blog post uses KDS as an intermediary layer. This approach can provide the following benefits:

Flexibility: By using KDS, the architecture can easily integrate with other real-time processing or analytics services, such as AWS Lambda, Amazon Kinesis Data Analytics, or Amazon Kinesis Data Firehose.
Scalability: KDS can handle high-throughput, real-time data streams, which may be difficult to achieve with a direct ingestion into KDF or Redshift.
Reliability: KDS provides durability and fault tolerance, ensuring that the data is not lost in the event of failures or other issues.

However, it's important to note that the specific use case and requirements of the application will ultimately determine the most appropriate architecture. The decision to use KDS, KDF, or Redshift (or a combination of these services) should be based on factors such as the volume and velocity of the data, the need for real-time processing, the desired level of data durability and reliability, and the overall cost and operational complexity.

AWS
답변함 6달 전
profile picture
전문가
검토됨 5달 전
  • Thank you so much for the details. I really appreciate the time you took to elaborate in such a great way.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인