<p align="center"> <img height="200" src="docs/images/pgai_logo.png#gh-dark-mode-only" alt="pgai"/> <img height="200" src="docs/images/pgai_white.png#gh-light-mode-only" alt="pgai"/> </p> <div align=center> <h3>Power your RAG and Agentic applications with PostgreSQL</h3> <div> <a href="https://github.com/timescale/pgai/tree/main/docs"><strong>Docs</strong></a> · <a href="https://discord.gg/KRdHVXAmkp"><strong>Join the pgai Discord!</strong></a> · <a href="https://tsdb.co/gh-pgai-signup"><strong>Try timescale for free!</strong></a> · <a href="https://github.com/timescale/pgai/releases"><strong>Changelog</strong></a> </div> </div> <br/>

A Python library that transforms PostgreSQL into a robust, production-ready retrieval engine for RAG and Agentic applications.

🔄 Automatically create and synchronize vector embeddings from PostgreSQL data and S3 documents. Embeddings update automatically as data changes.
🤖 Semantic Catalog: Enable natural language to SQL with AI. Automatically generate database descriptions and power text-to-SQL for agentic applications.
🔍 Powerful vector and semantic search with pgvector and pgvectorscale.
🛡️ Production-ready out-of-the-box: Supports batch processing for efficient embedding generation, with built-in handling for model failures, rate limits, and latency spikes.
🐘 Works with any PostgreSQL database, including Timescale Cloud, Amazon RDS, Supabase and more.

Basic Architecture: The system consists of an application you write, a PostgreSQL database, and stateless vectorizer workers. The application defines a vectorizer configuration to embed data from sources like PostgreSQL or S3. The workers read this configuration, processes the data queue into embeddings and chunked text, and writes the results back. The application then queries this data to power RAG and semantic search.

The key strength of this architecture lies in its resilience: data modifications made by the application are decoupled from the embedding process, ensuring that failures in the embedding service do not affect the core data operations.

install

First, install the pgai package.

pip install pgai

Then, install the pgai database components. You can do this from the terminal using the CLI or in your Python application code using the pgai python package.

# from the cli
pgai install -d <database-url>

# or from the python package, often done as part of your application setup
import pgai
pgai.install(DB_URL)

If you are not on Timescale Cloud you will also need to run the pgai vectorizer worker. Install the dependencies for it via:

pip install "pgai[vectorizer-worker]"

If you are using the semantic catalog, you will need to run:

pip install "pgai[semantic-catalog]"

Quick Start

This quickstart demonstrates how pgai Vectorizer enables semantic search and RAG over PostgreSQL data by automatically creating and synchronizing embeddings as data changes.

Looking for text-to-SQL? Check out the Semantic Catalog quickstart to transform natural language questions into SQL queries.

The key "secret sauce" of pgai Vectorizer is its declarative approach to embedding generation. Simply define your pipeline and let Vectorizer handle the operational complexity of keeping embeddings in sync, even when embedding endpoints are unreliable. You can define a simple version of the pipeline as follows:

CREATE TABLE IF NOT EXISTS wiki (
    id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    url TEXT NOT NULL,
    title TEXT NOT NULL,
    text TEXT NOT NULL
)

SELECT ai.create_vectorizer(
     'wiki'::regclass,
     loading => ai.loading_column(column_name=>'text'),
     destination => ai.destination_table(target_table=>'wiki_embedding_storage'),
     embedding => ai.embedding_openai(model=>'text-embedding-ada-002', dimensions=>'1536')
    )

The vectorizer will automatically create embeddings for all the rows in the wiki table, and, more importantly, will keep the embeddings synced with the underlying data as it changes. Think of it almost like declaring an index on the wiki table, but instead of the database managing the index datastructure for you, the Vectorizer is managing the embeddings.

Running the quick start

Prerequisites:

A PostgreSQL database (docker instructions).
An OpenAI API key (we use openai for embedding in the quick start, but you can use multiple providers).

Create a .env file with the following:

OPENAI_API_KEY=<your-openai-api-key>
DB_URL=<your-database-url>

You can download the full python code and requirements.txt from the quickstart example and run it in the same directory as the .env file.

<details> <summary>Click here for a bash script to run the quickstart</summary>

curl -O https://raw.githubusercontent.com/timescale/pgai/main/examples/quickstart/main.py
curl -O https://raw.githubusercontent.com/timescale/pgai/main/examples/quickstart/requirements.txt
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py

</details> Sample output: <details> <summary>Click to expand sample output</summary>

Search results 1:
[WikiSearchResult(id=7,
                  url='https://en.wikipedia.org/wiki/Aristotle',
                  title='Aristotle',
                  text='Aristotle (;  Aristotélēs, ; 384–322\xa0BC) was an '
                       'Ancient Greek philosopher and polymath. His writings '
                       'cover a broad range of subjects spanning the natural '
                       'sciences, philosophy, linguistics, economics, '
                       'politics, psychology and the arts. As the founder of '
                       'the Peripatetic school of philosophy in the Lyceum in '
                       'Athens, he began the wider Aristotelian tradition that '
                       'followed, which set the groundwork for the development '
                       'of modern science.\n'
                       '\n'
                       "Little is known about Aristotle's life. He was born in "
                       'the city of Stagira in northern Greece during the '
                       'Classical period. His father, Nicomachus, died when '
                       'Aristotle was a child, and he was brought up by a '
                       "guardian. At 17 or 18 he joined Plato's Academy in "
                       'Athens and remained there till the age of 37 (). '
                       'Shortly after Plato died, Aristotle left Athens and, '
                       'at the request of Philip II of Macedon, tutored his '
                       'son Alexander the Great beginning in 343 BC. He '
                       'established a library in the Lyceum which helped him '
                       'to produce many of his hundreds of books on papyru',
                  chunk='Aristotle (;  Aristotélēs, ; 384–322\xa0BC) was an '
                        'Ancient Greek philosopher and polymath. His writings '
                        'cover a broad range of subjects spanning the natural '
                        'sciences, philosophy, linguistics, economics, '
                        'politics, psychology and the arts. As the founder of '
                        'the Peripatetic school of philosophy in the Lyceum in '
                        'Athens, he began the wider Aristotelian tradition '
                        'that followed, which set the groundwork for the '
                        'development of modern science.',
                  distance=0.22242502364217387)]
Search results 2:
[WikiSearchResult(id=41,
                  url='https://en.wikipedia.org/wiki/pgai',
                  title='pgai',
                  text='pgai is a Python library that turns PostgreSQL into '
                       'the retrieval engine behind robust, production-ready '
                       'RAG and Agentic applications. It does this by '
                       'automatically creating vector embeddings for your data '
                       'based on the vectorizer you define.',
                  chunk='pgai is a Python library that turns PostgreSQL into '
                        'the retrieval engine behind robust, production-ready '
                        'RAG and Agentic applications. It does this by '
                        'automatically creating vector embeddings for your '
                        'data based on the vectorizer you define.',
                  distance=0.13639101792546204)]
RAG response:
The main thing pgai does right now is generating vector embeddings for data in PostgreSQL databases based on the vectorizer defined by the user, enabling the creation of robust RAG and Agentic applications.

</details>

Code walkthrough

Install the pgai database components

Pgai requires a few catalog tables and functions to be installed into the database. This is done using the pgai.install function, which will install the necessary components into the ai schema of the database.

pgai.install(DB_URL)

Create the vectorizer

This defines the vectorizer, which tells the system how to create the embeddings from the text column in the wiki table. The vectorizer creates a view wiki_embedding that we can query for the embeddings (as we'll see below).

async def create_vectorizer(conn: psycopg.AsyncConnection):
    async with conn.cursor() as cur:    
        await cur.execute("""
            SELECT ai.c

Pgai

Install / Use

README