🔷 The AI-Native Search Database

Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.

</div>

English | 中文版

</div>

🚀 What is OceanBase seekdb?

OceanBase seekdb is an AI-native search database that unifies relational, vector, text, JSON and GIS in a single engine, enabling hybrid search and in-database AI workflows.

🔥 Why OceanBase seekdb?

| Feature | seekdb | OceanBase | Chroma | Milvus | MySQL 9.0 | PostgreSQL<br/>+pgvector | DuckDB | Elasticsearch | | ------------------------ |:--------------------:|:-------------:|:----------:|:----------:|:-----------------------:|:----------------------------:|:----------:|:-----------------------------------:| | Embedded | ✅ | ❌ | ✅ | ✅ | ❌<sup>[1]</sup> | ❌ | ✅ | ❌ | | Single-Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Distributed | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | | MySQL Compatible | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | | Vector Search | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | | Full-Text Search | ✅ | ✅ | ✅ | ⚠️ | ✅ | ✅ | ✅ | ✅ | | Hybrid Search | ✅ | ✅ | ✅ | ✅ | ❌ | ⚠️ | ❌ | ✅ | | OLTP | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | | OLAP | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⚠️ | | License | Apache 2.0 | MulanPubL 2.0 | Apache 2.0 | Apache 2.0 | GPL 2.0 | PostgreSQL License | MIT | AGPLv3<br/>+SSPLv1<br/>+Elastic 2.0 |

[1] Embedded capability is removed in MySQL 8.0

✅ Supported

❌ Not Supported

⚠️ Limited

✨ Key Features

Build fast + Hybrid search + Multi model

Build fast: From prototype to production in minutes: create AI apps using Python, run VectorDBBench on 1C2G.
Hybrid Search: Combine vector search, full-text search and relational query in a single statement.
Multi-Model: Support relational, vector, text, JSON and GIS in a single engine.

AI inside + SQL inside

AI Inside: Run embedding, reranking, LLM inference and prompt management inside the database, supporting a complete document-in/data-out RAG workflow.
SQL Inside: Powered by the proven OceanBase engine, delivering real-time writes and queries with full ACID compliance, and seamless MySQL ecosystem compatibility.

🎬 Quick Start

Installation

Choose your platform:

<details> <summary><b>🐍 Python (Recommended for AI/ML)</b></summary>

pip install -U pyseekdb

</details> <details> <summary><b>🐳 Docker (Quick Testing)</b></summary>

docker run -d \
  --name seekdb \
  -p 2881:2881 \
  -p 2886:2886 \
  -v ./data:/var/lib/oceanbase \
  oceanbase/seekdb:latest

Please refer to the document of this docker image for details.

</details> <details> <summary><b>📦 Binary (Standalone)</b></summary>

# Linux
rpm -ivh seekdb-1.x.x.x-xxxxxxx.el8.x86_64.rpm

Please replace the version number with the actual RPM package version.

</details>

🎯 AI Search Example

Build a semantic search system in 5 minutes:

<details> <summary><b>🗄️ 🐍 Python SDK</b></summary>

# install sdk first
pip install -U pyseekdb

"""
this example demonstrates the most common operations with embedding functions:
1. Create a client connection
2. Create a collection with embedding function
3. Add data using documents (embeddings auto-generated)
4. Query using query texts (embeddings auto-generated)
5. Print query results

This is a minimal example to get you started quickly with embedding functions.
"""

import pyseekdb
from pyseekdb import DefaultEmbeddingFunction

# ==================== Step 1: Create Client Connection ====================
# You can use embedded mode, server mode, or OceanBase mode
# For this example, we'll use server mode (you can change to embedded or OceanBase)

# Embedded mode (local SeekDB)
client = pyseekdb.Client(
    path="./seekdb.db",
    database="test"
)
# Alternative: Server mode (connecting to remote SeekDB server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     database="test",
#     user="root",
#     password=""
# )

# Alternative: Remote server mode (OceanBase Server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     tenant="test",  # OceanBase default tenant
#     database="test",
#     user="root",
#     password=""
# )

# ==================== Step 2: Create a Collection with Embedding Function ====================
# A collection is like a table that stores documents with vector embeddings
collection_name = "my_simple_collection"

# Create collection with default embedding function
# The embedding function will automatically convert documents to embeddings
collection = client.create_collection(
    name=collection_name,
    #embedding_function=DefaultEmbeddingFunction()  # Uses default model (384 dimensions)
)

print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
print(f"Embedding function: {collection.embedding_function}")

# ==================== Step 3: Add Data to Collection ====================
# With embedding function, you can add documents directly without providing embeddings
# The embedding function will automatically generate embeddings from documents

documents = [
    "Machine learning is a subset of artificial intelligence",
    "Python is a popular programming language",
    "Vector databases enable semantic search",
    "Neural networks are inspired by the human brain",
    "Natural language processing helps computers understand text"
]

ids = ["id1", "id2", "id3", "id4", "id5"]

# Add data with documents only - embeddings will be auto-generated by embedding function
collection.add(
    ids=ids,
    documents=documents,  # embeddings will be automatically generated
    metadatas=[
        {"category": "AI", "index": 0},
        {"category": "Programming", "index": 1},
        {"category": "Database", "index": 2},
        {"category": "AI", "index": 3},
        {"category": "NLP", "index": 4}
    ]
)

print(f"\nAdded {len(documents)} documents to collection")
print("Note: Embeddings were automatically generated from documents using the embedding function")

# ==================== Step 4: Query the Collection ====================
# With embedding function, you can query using text directly
# The embedding function will automatically convert query text to query vector

# Query using text - query vector will be auto-generated by embedding function
query_text = "artificial intelligence and machine learning"

results = collection.query(
    query_texts=query_text,  # Query text - will be embedded automatically
    n_results=3  # Return top 3 most similar documents
)

print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")

# ==================== Step 5

Seekdb

Install / Use

README