Exploring Embeddings and LLMs: Amazon Titan Embedding V2, Claude 3.5 and OpenSearch Serverless
This article demonstrates how to integrate Amazon Titan Embedding V2 and Claude 3.5 with Amazon OpenSearch Serverless to build scalable workflows for semantic search and text generation. It offers practical guidance on creating pipelines for real-world applications, such as recommendation systems and conversational AI, by combining embeddings with large language models to enhance precision and efficiency in information retrieval and response generation.
Introduction
This article explores how to leverage embeddings and large language models (LLMs) using AWS services, focusing on Amazon Titan Embedding V2 for embedding generation and Claude 3.5 for advanced text generation. We will integrate these models with Amazon OpenSearch Serverless, a scalable solution for vector indexing and retrieval without the need to manage infrastructure.
By combining these technologies, this article demonstrates how to create efficient pipelines for semantic search, content recommendation, and advanced chatbot systems. We will go through an example implementation that covers the entire workflow, from generating embeddings to retrieving information and generating responses using Claude 3.5.
1. Embeddings with Amazon Titan Embedding V2 and OpenSearch Serverless
Embeddings are vector representations of textual data, converting text into high-dimensional vectors that capture the semantic meaning. Amazon Titan Embedding V2 provides a powerful tool for generating these embeddings, which can be stored and queried using OpenSearch Serverless. This process allows for more meaningful queries, matching results based on semantic similarities rather than simple keyword matches.
Use Cases:
- Semantic Search: By using embeddings, systems can retrieve information based on the meaning of the text. This is particularly useful for dealing with complex data, such as technical documents, scientific papers, or product descriptions.
- Recommendation Systems: Indexing and querying product, service, or content embeddings allow for the creation of robust recommendation systems that suggest items based on deep semantic similarities.
2. Implementation Example: Configuring an Information Retrieval Workflow with Embeddings and Claude 3.5
In this section, we will configure a full information retrieval pipeline that uses Amazon Titan Embedding V2 to generate embeddings and Amazon OpenSearch Serverless to store and query those embeddings. We will then use the Claude 3.5 model to answer questions based on the most relevant documents retrieved.
Before proceeding, ensure you create an Amazon S3 bucket where the documents.csv
file will be stored. This bucket will be used by the Lambda function to read and process the dataset. Make sure that your Lambda function has the necessary permissions to access the S3 bucket and read the file. Below is the structure of the documents.csv
file:
Title | Category | Content |
---|---|---|
Introduction to Machine Learning | Machine Learning | This document explains the basics of supervised and unsupervised learning. |
Advanced Neural Networks | Neural Networks | Overview of neural networks, backpropagation, and advanced architectures. |
AI in Healthcare | Healthcare | The role of AI in diagnostics, predictive analytics, and patient care. |
Note: In production, it is recommended to separate responsibilities, such as embedding generation, OpenSearch storage, and querying, into different Lambda functions or microservices. This ensures better scalability and fault tolerance.
Step 1: Data Preparation and Embedding Generation
We begin by loading a dataset of documents or descriptions, transforming these texts into embeddings using Amazon Titan V2. Each document will be represented by a high-dimensional vector.
import json import boto3 import csv s3 = boto3.client('s3') # Function to load CSV from S3 def load_csv_from_s3(bucket_name, file_key): response = s3.get_object(Bucket=bucket_name, Key=file_key) lines = response['Body'].read().decode('utf-8').splitlines() reader = csv.DictReader(lines) return list(reader) # Function to generate embeddings using Amazon Titan V2 def generate_embedding(text): client = boto3.client('bedrock') body = json.dumps({"inputText": text}) response = client.invoke_model( body=body, modelId='amazon.titan-embed-text-v2:0', accept='application/json', contentType='application/json' ) response_body = json.loads(response['body'].read()) return response_body['embedding'] # Lambda handler function def lambda_handler(event, context): # Load CSV from S3 bucket_name = 'your-bucket-name' file_key = 'documents.csv' documents = load_csv_from_s3(bucket_name, file_key) # Add embeddings to each document for document in documents: full_content = f"{document['title']} - {document['category']}: {document['content']}" document['embedding'] = generate_embedding(full_content) # Return documents with generated embeddings return { 'statusCode': 200, 'body': json.dumps(documents) }
Step 2: Indexing Embeddings in OpenSearch Serverless
After generating the embeddings, the next step is to store them in OpenSearch Serverless. We create an index that stores both the original text and its corresponding embeddings.
from opensearchpy import OpenSearch, RequestsHttpConnection # Configure the OpenSearch Serverless client client = OpenSearch( hosts=[{'host': 'YOUR_OPENSEARCH_ENDPOINT', 'port': 443}], use_ssl=True, verify_certs=True, connection_class=RequestsHttpConnection ) # Create the index to store the embeddings index_body = { "mappings": { "properties": { "full_content": {"type": "text"}, "embedding": {"type": "knn_vector", "dimension": 1024} # Correct embedding dimension for Titan model } } } client.indices.create(index='documents-index', body=index_body) # Insert documents into the index for _, row in df.iterrows(): document = { "full_content": row['full_content'], "embedding": row['embedding'] } client.index(index='documents-index', body=document)
Step 3: Querying OpenSearch Serverless Using Semantic Similarity
With the index configured and the embeddings stored, we can now perform semantic queries to find the most relevant documents.
# Function to retrieve similar documents based on a query def retrieve_similar_documents(query): query_embedding = generate_embedding(query) response = client.search( index='documents-index', body={ "query": { "knn": { "embedding": { "vector": query_embedding, "k": 3 # Return the top 3 most relevant documents } } } } ) return [item['_source']['full_content'] for item in response['hits']['hits']] # Example user query user_query = "What are the best supervised learning methods?" relevant_documents = retrieve_similar_documents(user_query)
Step 4: Integrating with Claude 3.5 for Response Generation
Once we retrieve the most relevant documents, we can pass this information along with the user query to the Claude 3.5 model, which will generate a detailed and contextualized response.
# Function to generate a response using Claude 3.5 with the retrieved documents as context def generate_response_with_claude(query, context): client = boto3.client('bedrock') prompt = f"Context: {context}\n\nQuestion: {query}\n\nProvide a detailed answer." body = json.dumps({"inputText": prompt}) response = client.invoke_model( body=body, modelId='anthropic.claude-3-5-sonnet-20240620-v1:0', accept='application/json', contentType='application/json' ) response_body = json.loads(response['body'].read()) return response_body['outputText'] # Concatenate relevant documents into context context = "\n".join(relevant_documents) # Generate the response using Claude 3.5 response = generate_response_with_claude(user_query, context) print(response)
3. Performance and Cost Management
One of the advantages of OpenSearch Serverless is its automatic scaling, which adjusts resources according to demand. This allows for efficient cost management, as the infrastructure automatically scales based on the number of queries or data processing required.
For Amazon OpenSearch Serverless, pricing details vary based on usage, including storage, query execution, and data ingestion. You can check the full pricing structure here.
Similarly, for Amazon Bedrock, pricing depends on the services used, such as model inference and customization. Detailed information is available here.
Conclusion
By integrating Amazon Titan Embedding V2 with Claude 3.5 through Amazon OpenSearch Serverless, we can create a highly efficient pipeline that combines semantic information retrieval with contextualized text generation. This workflow is especially useful in AI applications such as Q&A systems, virtual assistants, and recommendation engines, where precision and flexibility are crucial. Additionally, AWS’s serverless infrastructure helps optimize costs while providing scalability and robustness.
Take a look at Knowledge bases for Amazon Bedrock https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html, it abstracts most of this.
Relevant content
- Accepted Answerasked 9 months agolg...
- Accepted Answerasked 4 months agolg...
- asked 5 months agolg...
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 8 months ago