- Newest
- Most votes
- Most comments
Hello,
I understand you would like to inquire about the best AWS services to implement this shared data model while maintaining data integrity and providing high availability for both readers and consumers.
As for your objective on having the shared data model to reflect real-time, You can explore streaming services like Amazon Kinesis Data Streams or Amazon Managed Streaming Kafka to capture real-time data.
Addressing your concern below:
what would be the best format to store the shared payload?
=> It is always recommended to use parquet format, Parquet is a columnar storage format that offers efficient data retrieval.
However, JSON or other formats might be suitable depending on your specific needs.
what is the best option to share data is it to notify the other line of business if data changes ?
=> Services like Amazon SNS or Amazon SQS integrated with CloudWatch can alert interested parties about data changes.
- SNS - https://aws.amazon.com/sns/
- SQS - https://aws.amazon.com/sqs/
- Cloud watch - https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-cwe-now-eb.html
or let them grab the data that they are interested in? who can we perform data mapping to adapt the data to each line of business? at the shared Model level./ we want also to hide some data from one line of business to another, so not all of them can see the same data or the entire data.
=> To acheive the above, you can consider using Glue data catalog tables using AWS Glue job from the streaming jobs and with the help of lakeformation you can provide column-level, row-level, and cell-level security to restrict data access for different lines of business by creating data filters
- Glue Datacatalog - https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html
- Streaming data source - https://docs.aws.amazon.com/glue/latest/dg/edit-jobs-source-streaming.html
- Lake formation data filters - https://docs.aws.amazon.com/lake-formation/latest/dg/data-filters-about.html
How can we perform data mapping to tailor the data to the requirements of each line of business? at the shared model level the mapping should be done?
=> You can you AWS Glue ETL jobs can transform the shared data model to meet the specific requirements of each line of business. Once the Glue ETL is done you can use Athena or quicksight for further building analytical reports
- Glue jobs - https://docs.aws.amazon.com/glue/latest/dg/etl-jobs-section.html
- https://docs.aws.amazon.com/glue/latest/dg/author-job-glue.html
- Athena - https://docs.aws.amazon.com/athena/latest/ug/what-is.html
- Quicksight - https://docs.aws.amazon.com/quicksight/latest/user/welcome.html
- Lake formation - https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html
Additional Considerations:
For complex architectures, consider involving an AWS Solution Architect. AWS Support is available to answer further questions about specific services.
These are some helpful blogs to refer:
- https://aws.amazon.com/blogs/big-data/use-an-event-driven-architecture-to-build-a-data-mesh-on-aws/
- https://aws.amazon.com/blogs/big-data/securely-share-your-data-across-aws-accounts-using-aws-lake-formation/
- https://aws.amazon.com/blogs/big-data/patterns-for-enterprise-data-sharing-at-scale/
Thank you!
Relevant content
- asked 2 years ago
- asked 7 months ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 months ago