Seekdb
The AI-Native Search Database. Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.
Install / Use
/learn @oceanbase/SeekdbREADME
🔷 The AI-Native Search Database
Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.
</div><div align="center"> <p> <a href="https://oceanbase.ai"> <img alt="Documentation" height="20" src="https://img.shields.io/badge/OceanBase.ai-4285F4?style=for-the-badge&logo=read-the-docs&logoColor=white" /> </a> <a href="https://www.linkedin.com/company/oceanbase" target="_blank"> <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff" alt="follow on LinkedIn"> </a> <a href="https://www.youtube.com/@OceanBaseDB"> <img alt="Static Badge" src="https://img.shields.io/badge/YouTube-red?logo=youtube"> </a> <a href="https://deepwiki.com/oceanbase/seekdb"> <img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg" /> </a> <a href="https://discord.gg/74cF8vbNEs"> <img alt="Join Discord" src="https://img.shields.io/badge/Discord-Join%20Chat-5865F2?logo=discord&style=flat-square" /> </a> <a href="https://pepy.tech/projects/pylibseekdb"> <img height="20" alt="Downloads" src="https://static.pepy.tech/badge/pylibseekdb" /> </a> <a href="https://github.com/oceanbase/seekdb/blob/master/LICENSE"> <img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" /> </a> </p> </div> <div align="center">
English | 中文版
</div>
🚀 What is OceanBase seekdb?
OceanBase seekdb is an AI-native search database that unifies relational, vector, text, JSON and GIS in a single engine, enabling hybrid search and in-database AI workflows.
🔥 Why OceanBase seekdb?
| Feature | seekdb | OceanBase | Chroma | Milvus | MySQL 9.0 | PostgreSQL<br/>+pgvector | DuckDB | Elasticsearch | | ------------------------ |:--------------------:|:-------------:|:----------:|:----------:|:-----------------------:|:----------------------------:|:----------:|:-----------------------------------:| | Embedded | ✅ | ❌ | ✅ | ✅ | ❌<sup>[1]</sup> | ❌ | ✅ | ❌ | | Single-Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Distributed | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | | MySQL Compatible | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | | Vector Search | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | | Full-Text Search | ✅ | ✅ | ✅ | ⚠️ | ✅ | ✅ | ✅ | ✅ | | Hybrid Search | ✅ | ✅ | ✅ | ✅ | ❌ | ⚠️ | ❌ | ✅ | | OLTP | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | | OLAP | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⚠️ | | License | Apache 2.0 | MulanPubL 2.0 | Apache 2.0 | Apache 2.0 | GPL 2.0 | PostgreSQL License | MIT | AGPLv3<br/>+SSPLv1<br/>+Elastic 2.0 |
[1] Embedded capability is removed in MySQL 8.0
- ✅ Supported
- ❌ Not Supported
- ⚠️ Limited
✨ Key Features
Build fast + Hybrid search + Multi model
- Build fast: From prototype to production in minutes: create AI apps using Python, run VectorDBBench on 1C2G.
- Hybrid Search: Combine vector search, full-text search and relational query in a single statement.
- Multi-Model: Support relational, vector, text, JSON and GIS in a single engine.
AI inside + SQL inside
- AI Inside: Run embedding, reranking, LLM inference and prompt management inside the database, supporting a complete document-in/data-out RAG workflow.
- SQL Inside: Powered by the proven OceanBase engine, delivering real-time writes and queries with full ACID compliance, and seamless MySQL ecosystem compatibility.
🎬 Quick Start
Installation
Choose your platform:
<details> <summary><b>🐍 Python (Recommended for AI/ML)</b></summary>pip install -U pyseekdb
</details>
<details>
<summary><b>🐳 Docker (Quick Testing)</b></summary>
docker run -d \
--name seekdb \
-p 2881:2881 \
-p 2886:2886 \
-v ./data:/var/lib/oceanbase \
oceanbase/seekdb:latest
Please refer to the document of this docker image for details.
</details> <details> <summary><b>📦 Binary (Standalone)</b></summary># Linux
rpm -ivh seekdb-1.x.x.x-xxxxxxx.el8.x86_64.rpm
Please replace the version number with the actual RPM package version.
</details>🎯 AI Search Example
Build a semantic search system in 5 minutes:
<details> <summary><b>🗄️ 🐍 Python SDK</b></summary># install sdk first
pip install -U pyseekdb
"""
this example demonstrates the most common operations with embedding functions:
1. Create a client connection
2. Create a collection with embedding function
3. Add data using documents (embeddings auto-generated)
4. Query using query texts (embeddings auto-generated)
5. Print query results
This is a minimal example to get you started quickly with embedding functions.
"""
import pyseekdb
from pyseekdb import DefaultEmbeddingFunction
# ==================== Step 1: Create Client Connection ====================
# You can use embedded mode, server mode, or OceanBase mode
# For this example, we'll use server mode (you can change to embedded or OceanBase)
# Embedded mode (local SeekDB)
client = pyseekdb.Client(
path="./seekdb.db",
database="test"
)
# Alternative: Server mode (connecting to remote SeekDB server)
# client = pyseekdb.Client(
# host="127.0.0.1",
# port=2881,
# database="test",
# user="root",
# password=""
# )
# Alternative: Remote server mode (OceanBase Server)
# client = pyseekdb.Client(
# host="127.0.0.1",
# port=2881,
# tenant="test", # OceanBase default tenant
# database="test",
# user="root",
# password=""
# )
# ==================== Step 2: Create a Collection with Embedding Function ====================
# A collection is like a table that stores documents with vector embeddings
collection_name = "my_simple_collection"
# Create collection with default embedding function
# The embedding function will automatically convert documents to embeddings
collection = client.create_collection(
name=collection_name,
#embedding_function=DefaultEmbeddingFunction() # Uses default model (384 dimensions)
)
print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
print(f"Embedding function: {collection.embedding_function}")
# ==================== Step 3: Add Data to Collection ====================
# With embedding function, you can add documents directly without providing embeddings
# The embedding function will automatically generate embeddings from documents
documents = [
"Machine learning is a subset of artificial intelligence",
"Python is a popular programming language",
"Vector databases enable semantic search",
"Neural networks are inspired by the human brain",
"Natural language processing helps computers understand text"
]
ids = ["id1", "id2", "id3", "id4", "id5"]
# Add data with documents only - embeddings will be auto-generated by embedding function
collection.add(
ids=ids,
documents=documents, # embeddings will be automatically generated
metadatas=[
{"category": "AI", "index": 0},
{"category": "Programming", "index": 1},
{"category": "Database", "index": 2},
{"category": "AI", "index": 3},
{"category": "NLP", "index": 4}
]
)
print(f"\nAdded {len(documents)} documents to collection")
print("Note: Embeddings were automatically generated from documents using the embedding function")
# ==================== Step 4: Query the Collection ====================
# With embedding function, you can query using text directly
# The embedding function will automatically convert query text to query vector
# Query using text - query vector will be auto-generated by embedding function
query_text = "artificial intelligence and machine learning"
results = collection.query(
query_texts=query_text, # Query text - will be embedded automatically
n_results=3 # Return top 3 most similar documents
)
print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")
# ==================== Step 5
