2 different schemas being streamed into a datalake via kinesis stream+firehose, should each have its own stream?

0

I have an architecture where two entities are posting data to an API gateway. They both follow their own (different) json schema, API GW then pushes it into Kinesis Stream/Firehose. Should I create a separate stream+firehose for each schema? I understand that I could stream them both into the same Kinesis Stream / Firehose and use a lambda to parse each datapoint and determine how to write each data to s3 however I am afraid of lambda concurrency issues should the velocity of the data spike. What is the best practice in this context?

1 個回答
0

Mixing the entities is problematic. I recommend a separate Kinesis Data Stream / Lambda or Kinesis Data Firehose. Splitting them into separate streams will allow you to more easily apply Schema Registry to validate their JSON schemas and write Consumers that simply read their expected JSON from a Kinesis Data Stream. Kinesis Data Firehose is a reliable ETL service that can load your entities into various sinks. Keeping them separate will have downstream implications like ease of configuring AWS Glue schemas/tables for querying with Athena or machine learning.

Here's a blog on Schema Registry: https://aws.amazon.com/blogs/big-data/evolve-json-schemas-in-amazon-msk-and-amazon-kinesis-data-streams-with-the-aws-glue-schema-registry/

AWS
Jason_W
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南