Comet
A Vector Store written in Go - Supports hybrid retrieval over BM25, Flat, HNSW, IVF, PQ and IVFPQ Index with Quantization, Metadata Filtering, Reranking, Reciprocal Rank Fusion, Soft Deletes, Index Rebuilds and much much more
Install / Use
/learn @wizenheimer/CometREADME
Comet

A high-performance hybrid vector store written in Go. Comet brings together multiple indexing strategies and search modalities into a unified, hackable package. Hybrid retrieval with reciprocal rank fusion, autocut, pre-filtering, semantic search, full-text search, and multi-KNN searches, and multi-query operations — all in pure Go.
Understand search internals from the inside out. Built for hackers, not hyperscalers. Tiny enough to fit in your head. Decent enough to blow it.
Choose from:
- Flat (exact), HNSW (graph), IVF (clustering), PQ (quantization), or IVFPQ (hybrid) storage backends
- Full-Text Search: BM25 ranking algorithm with tokenization and normalization
- Metadata Filtering: Fast filtering using Roaring Bitmaps and Bit-Sliced Indexes
- Ranking Programmability: Reciprocal Rank Fusion, Fixed size result sets, Threshold based result sets, Dynamic result sets etc.
- Hybrid Search: Unified interface combining vector, text, and metadata search
Table of Contents
- Overview
- Features
- Installation
- Quick Start
- Architecture
- Core Concepts
- API Reference
- Examples
- Configuration
- API Details
- Use Cases
- Contributing
- License
Overview
Everything you need to understand how vector databases actually work—and build one yourself.
What's inside:
- 5 Vector Storage Types: Flat, HNSW, IVF, PQ, IVFPQ
- 3 Distance Metrics: L2, L2 Squared, Cosine
- Full-Text Search: BM25 ranking with Unicode tokenization
- Metadata Filtering: Roaring bitmaps + Bit-Sliced Indexes
- Hybrid Search: Combine vector + text + metadata with Reciprocal Rank Fusion
- Advanced Search: Multi-KNN queries, multi-query operations, autocut result truncation
- Production Features: Thread-safe, serialization, soft deletes, configurable parameters
Everything you need to understand how vector databases actually work—and build one yourself.
Features
Vector Storage
- Flat: Brute-force exact search (100% recall baseline)
- HNSW: Hierarchical navigable small world graphs (95-99% recall, O(log n) search)
- IVF: Inverted file index with k-means clustering (85-95% recall, 10-20x speedup)
- PQ: Product quantization for compression (85-95% recall, 10-500x memory reduction)
- IVFPQ: IVF + PQ combined (85-95% recall, 100x speedup + 500x compression)
Search Modalities
- Vector Search: L2, L2 Squared, and Cosine distance metrics
- Full-Text Search: BM25 ranking with Unicode-aware tokenization
- Metadata Filtering: Boolean queries on structured attributes
- Hybrid Search: Combine all three with configurable fusion strategies
Fusion Strategies
- Weighted Sum: Linear combination with configurable weights
- Reciprocal Rank Fusion (RRF): Scale-independent rank-based fusion
- Max/Min Score: Simple score aggregation
Data Structures (The Good Stuff)
- HNSW Graphs: Multi-layer skip lists for approximate nearest neighbor search
- Roaring Bitmaps: Compressed bitmaps for metadata filtering (array, bitmap, run-length encoding)
- Bit-Sliced Index (BSI): Efficient numeric range queries without full scans
- Product Quantization Codebooks: Learned k-means centroids for vector compression
- Inverted Indexes: Token-to-document mappings for full-text search
Other Capabilities
- Quantization: Full precision, half precision, int8 precision
- Soft Deletes: Fast deletion with lazy cleanup
- Serialization: Persist and reload indexes
- Thread-Safe: Concurrent read/write operations
- Autocut: Automatic result truncation based on score gaps
Installation
go get github.com/wizenheimer/comet
Quick Start
package main
import (
"fmt"
"log"
"github.com/wizenheimer/comet"
)
func main() {
// Create a vector store (384-dimensional embeddings with cosine distance)
index, err := comet.NewFlatIndex(384, comet.Cosine)
if err != nil {
log.Fatal(err)
}
// Add vectors
vec1 := make([]float32, 384)
// ... populate vec1 with your embedding ...
node := comet.NewVectorNode(vec1)
index.Add(*node)
// Search for similar vectors
query := make([]float32, 384)
// ... populate query vector ...
results, err := index.NewSearch().
WithQuery(query).
WithK(10).
Execute()
if err != nil {
log.Fatal(err)
}
// Process results
for i, result := range results {
fmt.Printf("%d. ID=%d, Score=%.4f\n", i+1, result.GetId(), result.GetScore())
}
}
Output:
1. ID=123, Score=0.0234
2. ID=456, Score=0.0567
3. ID=789, Score=0.0823
...
Architecture
System Architecture
Comet is organized into three main search engines that can work independently or together:
Application Layer
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
│ (Using Comet as a Go Library) │
└──────────────────────┬──────────────────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
Vector Text Metadata
Search Engine Layer
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Vector │ │ Text │ │ Metadata │
│ Search │ │ Search │ │ Filtering │
│ Engine │ │ Engine │ │ Engine │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ Semantic │ Keywords │ Filters
│ Similarity │ + Relevance │ + Boolean Logic
▼ ▼ ▼
Index Storage Layer
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ HNSW / IVF │ │ BM25 Index │ │ Roaring │
│ / PQ / Flat │ │ (Inverted) │ │ Bitmaps │
└─────────────┘ └─────────────┘ └─────────────┘
Graph/Trees Token→DocIDs Compressed Sets
Hybrid Coordinator
All Three Engines
│
▼
┌─────────────────┐
│ Hybrid Search │
│ Coordinator │
│ (Score Fusion) │
└─────────────────┘
│
▼
Combined Results
Component Details
Component A: Vector Storage Engine
Manages vector storage and similarity search across multiple index types.
Common Interface:
┌────────────────────────────────────┐
│ VectorIndex Interface │
│ │
│ ├─ Train(vectors) │
│ ├─ Add(vector) │
│ ├─ Remove(vector) │
│ └─ NewSearch() │
└────────────────────────────────────┘
Available Implementations:
FlatIndex → Brute force, 100% recall
HNSWIndex → Graph-based, O(log n)
IVFIndex → Clustering, 10-20x faster
PQIndex → Quantization, 10-500x compression
IVFPQIndex → Hybrid, best of IVF + PQ
Responsibilities:
- Vector preprocessing (normalization for cosine distance)
- Distance calculations (Euclidean, L2², Cosine)
- K-nearest neighbor search
- Serialization/deserialization
- Soft delete management with flush mechanism
Performance Characteristics:
- Flat: O(n×d) search, 100% recall
- HNSW: O(M×ef×log n) search, 95-99% recall
- IVF: O(nProbes×n/k×d) search, 85-95% recall
Component B: Text Search Engine
Full-text search using BM25 ranking algorithm.
Inverted Index:
┌────────────────────────────────────┐
│ term → RoaringBitmap(docIDs) │
│ │
│ "machine" → {1, 5, 12, 45} │
│ "learning" → {1, 3, 12, 20} │
│ "neural" → {3, 20, 45} │
└────────────────────────────────────┘
Term Frequencies:
┌────────────────────────────────────┐
│ term → {docID: count} │
│ │
│ "machine" → {1: 3, 5: 1, 12: 2} │
└────────────────────────────────────┘
Document Stats:
┌────────────────────────────────────┐
│ docID → (length, token_count) │
│ │
│ 1 → (250 chars, 45 tokens) │
│ 5 → (180 chars, 32 tokens) │
└────────────────────────────────────┘
Responsibilities:
- Text tokenization (UAX#29 word segmentation)
- Unicode normalization (NFKC)
- Inverted index maintenance
- BM25 score calculation
- Top-K retrieval with heap
Performance Characteristics:
- Add: O(m) where m = tokens
- Search: O(q×d_avg) where q = query terms, d_avg = avg docs per term
- Memory: Compressed inverted index, no original text stored
Component C: Metadata Filter Engine
Fast filtering using compressed bitmaps.
Categorical Fields (Roaring Bitmaps):
┌────────────────────────────────────┐
│ field:value → Bitmap(docIDs) │
│ │
│ "category:electronics" → {1,5,12} │
│ "category:books" → {2,8,15} │
│ "in_stock:true" → {1,2,5} │
└────────────────────────────────────┘
Numeric Fields (Bit-Sliced Index):
┌────────────────────────────────────┐
│ field → BSI (range queries) │
│ │
│ "price" → [0-1000, 1000-5000] │
│ "rating" → [0-5 scale] │
└────────────────────────────────────┘
Document Universe:
┌────────────────────────────────────┐
│ allDocs → Bitmap(all IDs) │
│ │
│ Used for NOT o
