How to navigate the data offering?

0

In a new pipeline we are adding to our product, we are collecting a bunch of data from different sources with something similar to keywords. The collected data is mainly text with associated metadata. After collection, the data passes through a filtering stage before being inserted into a database, where it is combined with user data and feedback data based on whether it was successful or not based on a few tests in the real world.

I was wondering, what people think would be a good database for the 2 different stages (after data collection and after filtering). The collected data has to be stored and it has to be queryable by the metadata. We are also thinking about adding embeddings to this to make some filtering easier at a later stage. After the filtering, the data passes through a transformation layer, so I can store it structured. I have associated features for the data (embeddings,...), metadata, and feedback data. The setup should be able to cover a recommendation system use-case over time. I am at the moment thinking PostgreSQL.

The data also has a short life-cycle of use (about 1 month) before it gets replaced with new data. I could get away with storing it in a less available storage only for training new models. I want to store the input texts, the features, metadata, outputs, and feedback permanently. Especially the feedback is sparse, so we do not get feedback for each output.

My concrete questions are:

  1. What databases seem to be most suitable for the (a) collected data (b) filtered data (c) permanent data?
  2. What data model is suitable for recommendation systems?

I am at the moment not storing the collected data but for optimizing the pipeline over time this will be necessary. The filtered data is roughly 50MB / day, so storing it for 1 month would get us to roughly 1500MB. The factor for the collected data should be between 20-30x. Thank you very much for your help!

Nicolay
asked 3 months ago132 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions

Relevant content