Introduction to Aurora ML

Lecture de 5 minute(s)
Niveau du contenu : Intermédiaire
0

Simplify PostgreSQL, pgvector, and generative AI applications with Amazon Aurora, Amazon Bedrock, and Aurora ML

PostgreSQL is a very popular, open-source relational database that includes the extension pgvector which enables powerful similarity searches for generative AI applications.

Large Language Models (LLMs) are an integral part of generative AI applications and Amazon Bedrock is your gateway to these models. Bedrock is a fully managed service that allows you to use many different LLMs from many different vendors like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon.

Amazon Aurora PostgreSQL, a fully managed database service that provides high-performance and availability at a global scale, has a feature called Aurora ML. This feature simplifies the interaction between Amazon Aurora PostgreSQL and Amazon Bedrock by exposing LLMs with SQL.

This article covers provides an introduction on the power of Aurora ML for creating generative AI applications in PostgreSQL.

Similarity Searches

To get started, let's cover a basic SQL statement showing how a similarity search works.

SELECT *
FROM sales
WHERE product LIKE '%apple%';

This is will retrieve products like 'apple sauce' or 'apple pie' but it won't include results such as 'pear' or 'strawberry' that are similar to the word apple but dissimilar in terms of characters. This simple SQL query doesn't capture the meaning of the search term and instead, it only matches on the characters.

But what if you want to search for text that is similar in meaning to 'apple' to find other fruits that are similar? Or what if you want to search for the meaning of a phrase or a group of words? This is where generative AI comes in.

Generative AI

Generative AI is implemented by creating vector embeddings of strings which are stored as vectors in a database so that similarity searches can be performed. If you are a database guy like me, then you are probably wondering what is a vector embedding, how is it created, how do I query the data, what is Aurora ML, and where can I learn more?

Q&A

Q: What is a vector embedding?

A: A vector embedding is a numeric representation of a string stored in a new datatype called a VECTOR. In PostgreSQL, this new type is made available with a package called pgvector and a VECTOR is basically an ARRAY but with a fixed length. The pgvector package also provides aggregates, functions, operators, and indexes to perform searches of VECTOR data.

An example VECTOR:

[0.7265625, -0.0703125, 0.34765625, ..., 0.91015625]

Q: How do you create a vector embedding?

A: It is done with a Large Language Model (LLM) like Amazon Titan which is very easy to use with Amazon Bedrock. Amazon Bedrock is a fully managed service that allows you to use many different LLMs from many different vendors like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon.

Here is an example using Python to call the Amazon Titan model to create an embedding.

import boto3, json, sys

bedrock = boto3.client(service_name='bedrock', region_name='us-west-2')
bedrock_runtime = boto3.client(service_name='bedrock-runtime', region_name='us-west-2')

def get_embedding(prompt_data):
  response = bedrock_runtime.invoke_model( body=prompt_data, modelId='amazon.titan-embed-text-v1', accept='application/json', contentType='application/json')
  response_body = json.loads(response['body'].read())
  embedding = response_body.get('embedding')
  return embedding

if __name__ == '__main__':
  arg1 = sys.argv[1]
  prompt_data = json.dumps({ "inputText": arg1 })
  embedding = get_embedding(prompt_data)
  print(embedding)

It could be called like this which returns the vector embedding of the "hello world" string using Amazon Titan LLM.

python3.11 embed.py "hello world"

Hello World Output

Q: How do query vector embeddings in PostgreSQL?

A: Using the pgvector extension, you can query it with new operators but the implementation is slightly different than typical operators (=, >, >=, <, <=, etc.) used in the WHERE clause of a SQL query. Because the query is for the most similar vector embeddings, the pvector operator is located in the ORDER BY section of a query.

As an example, let's say there is a movie table with that has a VECTOR column called movie_embedding and that data was generated with a LLM using the movie summaries, actors, reviews, etc. Next, you search for movies with "action comedy aliens future".

Step 1 is to create a vector embedding with a LLM of this search criteria.

Step 2 is substitute that VECTOR as the value of variable :var1 in the following query.

SELECT *
FROM movies m
ORDER BY m.movie_embedding <=> :var1
LIMIT 5;

The pvector operator here is <=> which is used to find the top 5 most similar records. This particular operator is using the "cosine distance" to compare the similarity of the movie_embedding column with the variable :var1. The pgvector extension provides other operators for similarity comparisons and also indexes to improve performance.

Q: What is Aurora ML?

A: It is a convenient and secure method to pass data stored in Aurora PostgreSQL to LLMs for generation of a vector embeddings with SQL.

Here is a simple example which is just like the earlier Python example. It uses the same Amazon Titan LLM but instead of Python, it is using SQL.

SELECT aws_bedrock.invoke_model_get_embeddings(
  model_id := 'amazon.titan-embed-text-v1', 
  content_type := 'application/json', 
  json_key := 'embedding', 
  model_input := '{ "inputText": "hello world"}');

The output from this SQL query is a VECTOR that can be used to store embeddings in a table and also for creating embeddings for similarity searches.

Q: How can I learn more?

A: A hands-on workshop is available that provides detailed, step-by-step instructions on how to start using Amazon Aurora PostgreSQL, Aurora ML, pgvector, and Amazon Bedrock to take your similarity searches to the next level with generative AI. This workshop is available for free and can be executed either at an AWS event or with your own existing AWS account.

https://catalog.workshops.aws/gtm-sql-vector

Workshop

Summary

Amazon Aurora PostgreSQL with Aurora ML makes it easy for SQL savvy users, developers, architects, administrators, analysts, and more to use the power of SQL to build and use generative AI.

Links:

https://catalog.workshops.aws/gtm-sql-vector

https://aws.amazon.com/bedrock/

https://aws.amazon.com/rds/aurora/

https://aws.amazon.com/rds/aurora/machine-learning/