2 different schemas being streamed into a datalake via kinesis stream+firehose, should each have its own stream?

0

I have an architecture where two entities are posting data to an API gateway. They both follow their own (different) json schema, API GW then pushes it into Kinesis Stream/Firehose. Should I create a separate stream+firehose for each schema? I understand that I could stream them both into the same Kinesis Stream / Firehose and use a lambda to parse each datapoint and determine how to write each data to s3 however I am afraid of lambda concurrency issues should the velocity of the data spike. What is the best practice in this context?

1 回答
0

Mixing the entities is problematic. I recommend a separate Kinesis Data Stream / Lambda or Kinesis Data Firehose. Splitting them into separate streams will allow you to more easily apply Schema Registry to validate their JSON schemas and write Consumers that simply read their expected JSON from a Kinesis Data Stream. Kinesis Data Firehose is a reliable ETL service that can load your entities into various sinks. Keeping them separate will have downstream implications like ease of configuring AWS Glue schemas/tables for querying with Athena or machine learning.

Here's a blog on Schema Registry: https://aws.amazon.com/blogs/big-data/evolve-json-schemas-in-amazon-msk-and-amazon-kinesis-data-streams-with-the-aws-glue-schema-registry/

AWS
Jason_W
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则