What's the best way to send real-time data to Amazon Redshift?

0

I need an analysis tool for a product that sends logs or data about advertisements on the product's website. The product sends around 100,000 events per minute or more. All of the data is important for the analysis, and therefore, I can't afford the loss of data. What is the best way to send this data to Amazon Redshift, considering factors, such as performance efficiency, data consistency, and cost optimization?

AWS
asked 3 years ago1513 views
3 Answers
0
Accepted Answer

If the data is streamed through Amazon Kinesis Data Streams (KDS) , choose one of the following options:

  1. Kinesis Data Streams --> Lambda using the Redshift Data API --> Redshift
  2. Kinesis Data Streams --> Kinesis Firehose --> Redshift
  3. Kinesis Data Streams --> Kinesis Firehose --> Amazon S3 (partitioned) <-- Redshift Spectrum (run AWS Glue crawler periodically)

With all of these options, the data can be queried as soon as it's received. However, you might have to occasionally wait for a considerable amount of time before querying the data.

For a more cost-effective approach, do the following: First, write the data to Amazon S3 through Kinesis Data Streams --> Kinesis Firehose --> S3 --> Lambda --> S3 (optimized to Parquet or ORC). Then, run the AWS Glue crawler at periodic intervals (example: every hour) to refresh the AWS Glue Data Catalog. Query the data from Redshift with Amazon Spectrum Spectrum using the AWS Glue Data Catalog.

AWS
Kunal_G
answered 3 years ago
0

Alternate approach, use Redshift federated query feature to access, analyze and join real-time data from operational / transactional DB such as, Amazon Aurora or Amazon RDS PostgreSQL and MySQL DB with data warehouse and data lake dataset. For more details, refer to the documentation https://docs.aws.amazon.com/redshift/latest/dg/federated-overview.html.

AWS
sudhig
answered 3 years ago
0

You can use the Amazon Redshift streaming ingestion capability to update your analytics databases in near-real time. Amazon Redshift streaming ingestion simplifies data pipelines by letting you create materialized views directly on top of data streams. With this capability in Amazon Redshift, you can use SQL (Structured Query Language) to connect to and directly ingest data from data streams, such as Amazon Kinesis Data Streams or Amazon Managed Streaming for Apache Kafka (Amazon MSK) data streams, and pull data directly to Amazon Redshift.

https://aws.amazon.com/cn/blogs/big-data/real-time-analytics-with-amazon-redshift-streaming-ingestion/

https://aws.amazon.com/cn/blogs/big-data/near-real-time-analytics-using-amazon-redshift-streaming-ingestion-with-amazon-kinesis-data-streams-and-amazon-dynamodb/

AWS
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions