2 different schemas being streamed into a datalake via kinesis stream+firehose, should each have its own stream?

0

I have an architecture where two entities are posting data to an API gateway. They both follow their own (different) json schema, API GW then pushes it into Kinesis Stream/Firehose. Should I create a separate stream+firehose for each schema? I understand that I could stream them both into the same Kinesis Stream / Firehose and use a lambda to parse each datapoint and determine how to write each data to s3 however I am afraid of lambda concurrency issues should the velocity of the data spike. What is the best practice in this context?

1개 답변
0

Mixing the entities is problematic. I recommend a separate Kinesis Data Stream / Lambda or Kinesis Data Firehose. Splitting them into separate streams will allow you to more easily apply Schema Registry to validate their JSON schemas and write Consumers that simply read their expected JSON from a Kinesis Data Stream. Kinesis Data Firehose is a reliable ETL service that can load your entities into various sinks. Keeping them separate will have downstream implications like ease of configuring AWS Glue schemas/tables for querying with Athena or machine learning.

Here's a blog on Schema Registry: https://aws.amazon.com/blogs/big-data/evolve-json-schemas-in-amazon-msk-and-amazon-kinesis-data-streams-with-the-aws-glue-schema-registry/

AWS
Jason_W
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠