SkillAgentSearch skills...

Comet

A Vector Store written in Go - Supports hybrid retrieval over BM25, Flat, HNSW, IVF, PQ and IVFPQ Index with Quantization, Metadata Filtering, Reranking, Reciprocal Rank Fusion, Soft Deletes, Index Rebuilds and much much more

Install / Use

/learn @wizenheimer/Comet
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Comet

Cover

A high-performance hybrid vector store written in Go. Comet brings together multiple indexing strategies and search modalities into a unified, hackable package. Hybrid retrieval with reciprocal rank fusion, autocut, pre-filtering, semantic search, full-text search, and multi-KNN searches, and multi-query operations — all in pure Go.

Understand search internals from the inside out. Built for hackers, not hyperscalers. Tiny enough to fit in your head. Decent enough to blow it.

Choose from:

  • Flat (exact), HNSW (graph), IVF (clustering), PQ (quantization), or IVFPQ (hybrid) storage backends
  • Full-Text Search: BM25 ranking algorithm with tokenization and normalization
  • Metadata Filtering: Fast filtering using Roaring Bitmaps and Bit-Sliced Indexes
  • Ranking Programmability: Reciprocal Rank Fusion, Fixed size result sets, Threshold based result sets, Dynamic result sets etc.
  • Hybrid Search: Unified interface combining vector, text, and metadata search

Table of Contents

Overview

Everything you need to understand how vector databases actually work—and build one yourself.

What's inside:

  • 5 Vector Storage Types: Flat, HNSW, IVF, PQ, IVFPQ
  • 3 Distance Metrics: L2, L2 Squared, Cosine
  • Full-Text Search: BM25 ranking with Unicode tokenization
  • Metadata Filtering: Roaring bitmaps + Bit-Sliced Indexes
  • Hybrid Search: Combine vector + text + metadata with Reciprocal Rank Fusion
  • Advanced Search: Multi-KNN queries, multi-query operations, autocut result truncation
  • Production Features: Thread-safe, serialization, soft deletes, configurable parameters

Everything you need to understand how vector databases actually work—and build one yourself.

Features

Vector Storage

  • Flat: Brute-force exact search (100% recall baseline)
  • HNSW: Hierarchical navigable small world graphs (95-99% recall, O(log n) search)
  • IVF: Inverted file index with k-means clustering (85-95% recall, 10-20x speedup)
  • PQ: Product quantization for compression (85-95% recall, 10-500x memory reduction)
  • IVFPQ: IVF + PQ combined (85-95% recall, 100x speedup + 500x compression)

Search Modalities

  • Vector Search: L2, L2 Squared, and Cosine distance metrics
  • Full-Text Search: BM25 ranking with Unicode-aware tokenization
  • Metadata Filtering: Boolean queries on structured attributes
  • Hybrid Search: Combine all three with configurable fusion strategies

Fusion Strategies

  • Weighted Sum: Linear combination with configurable weights
  • Reciprocal Rank Fusion (RRF): Scale-independent rank-based fusion
  • Max/Min Score: Simple score aggregation

Data Structures (The Good Stuff)

  • HNSW Graphs: Multi-layer skip lists for approximate nearest neighbor search
  • Roaring Bitmaps: Compressed bitmaps for metadata filtering (array, bitmap, run-length encoding)
  • Bit-Sliced Index (BSI): Efficient numeric range queries without full scans
  • Product Quantization Codebooks: Learned k-means centroids for vector compression
  • Inverted Indexes: Token-to-document mappings for full-text search

Other Capabilities

  • Quantization: Full precision, half precision, int8 precision
  • Soft Deletes: Fast deletion with lazy cleanup
  • Serialization: Persist and reload indexes
  • Thread-Safe: Concurrent read/write operations
  • Autocut: Automatic result truncation based on score gaps

Installation

go get github.com/wizenheimer/comet

Quick Start

package main

import (
    "fmt"
    "log"

    "github.com/wizenheimer/comet"
)

func main() {
    // Create a vector store (384-dimensional embeddings with cosine distance)
    index, err := comet.NewFlatIndex(384, comet.Cosine)
    if err != nil {
        log.Fatal(err)
    }

    // Add vectors
    vec1 := make([]float32, 384)
    // ... populate vec1 with your embedding ...
    node := comet.NewVectorNode(vec1)
    index.Add(*node)

    // Search for similar vectors
    query := make([]float32, 384)
    // ... populate query vector ...
    results, err := index.NewSearch().
        WithQuery(query).
        WithK(10).
        Execute()

    if err != nil {
        log.Fatal(err)
    }

    // Process results
    for i, result := range results {
        fmt.Printf("%d. ID=%d, Score=%.4f\n", i+1, result.GetId(), result.GetScore())
    }
}

Output:

1. ID=123, Score=0.0234
2. ID=456, Score=0.0567
3. ID=789, Score=0.0823
...

Architecture

System Architecture

Comet is organized into three main search engines that can work independently or together:

Application Layer

┌─────────────────────────────────────────────────────────────┐
│                    Your Application                          │
│            (Using Comet as a Go Library)                     │
└──────────────────────┬──────────────────────────────────────┘
                       │
         ┌─────────────┼─────────────┐
         │             │             │
         ▼             ▼             ▼
    Vector         Text         Metadata

Search Engine Layer

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Vector    │    │    Text     │    │  Metadata   │
│   Search    │    │   Search    │    │  Filtering  │
│   Engine    │    │   Engine    │    │   Engine    │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │
       │ Semantic         │ Keywords         │ Filters
       │ Similarity       │ + Relevance      │ + Boolean Logic
       ▼                  ▼                  ▼

Index Storage Layer

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ HNSW / IVF  │    │ BM25 Index  │    │  Roaring    │
│ / PQ / Flat │    │ (Inverted)  │    │  Bitmaps    │
└─────────────┘    └─────────────┘    └─────────────┘
   Graph/Trees      Token→DocIDs       Compressed Sets

Hybrid Coordinator

                 All Three Engines
                       │
                       ▼
              ┌─────────────────┐
              │  Hybrid Search   │
              │  Coordinator     │
              │  (Score Fusion)  │
              └─────────────────┘
                       │
                       ▼
              Combined Results

Component Details

Component A: Vector Storage Engine

Manages vector storage and similarity search across multiple index types.

Common Interface:

┌────────────────────────────────────┐
│  VectorIndex Interface             │
│                                    │
│  ├─ Train(vectors)                 │
│  ├─ Add(vector)                    │
│  ├─ Remove(vector)                 │
│  └─ NewSearch()                    │
└────────────────────────────────────┘

Available Implementations:

FlatIndex          → Brute force, 100% recall
HNSWIndex          → Graph-based, O(log n)
IVFIndex           → Clustering, 10-20x faster
PQIndex            → Quantization, 10-500x compression
IVFPQIndex         → Hybrid, best of IVF + PQ

Responsibilities:

  • Vector preprocessing (normalization for cosine distance)
  • Distance calculations (Euclidean, L2², Cosine)
  • K-nearest neighbor search
  • Serialization/deserialization
  • Soft delete management with flush mechanism

Performance Characteristics:

  • Flat: O(n×d) search, 100% recall
  • HNSW: O(M×ef×log n) search, 95-99% recall
  • IVF: O(nProbes×n/k×d) search, 85-95% recall

Component B: Text Search Engine

Full-text search using BM25 ranking algorithm.

Inverted Index:

┌────────────────────────────────────┐
│  term → RoaringBitmap(docIDs)      │
│                                    │
│  "machine"  →  {1, 5, 12, 45}      │
│  "learning" →  {1, 3, 12, 20}      │
│  "neural"   →  {3, 20, 45}         │
└────────────────────────────────────┘

Term Frequencies:

┌────────────────────────────────────┐
│  term → {docID: count}             │
│                                    │
│  "machine" → {1: 3, 5: 1, 12: 2}   │
└────────────────────────────────────┘

Document Stats:

┌────────────────────────────────────┐
│  docID → (length, token_count)     │
│                                    │
│  1  →  (250 chars, 45 tokens)      │
│  5  →  (180 chars, 32 tokens)      │
└────────────────────────────────────┘

Responsibilities:

  • Text tokenization (UAX#29 word segmentation)
  • Unicode normalization (NFKC)
  • Inverted index maintenance
  • BM25 score calculation
  • Top-K retrieval with heap

Performance Characteristics:

  • Add: O(m) where m = tokens
  • Search: O(q×d_avg) where q = query terms, d_avg = avg docs per term
  • Memory: Compressed inverted index, no original text stored

Component C: Metadata Filter Engine

Fast filtering using compressed bitmaps.

Categorical Fields (Roaring Bitmaps):

┌────────────────────────────────────┐
│  field:value → Bitmap(docIDs)      │
│                                    │
│  "category:electronics" → {1,5,12} │
│  "category:books"       → {2,8,15} │
│  "in_stock:true"        → {1,2,5}  │
└────────────────────────────────────┘

Numeric Fields (Bit-Sliced Index):

┌────────────────────────────────────┐
│  field → BSI (range queries)       │
│                                    │
│  "price"  → [0-1000, 1000-5000]    │
│  "rating" → [0-5 scale]            │
└────────────────────────────────────┘

Document Universe:

┌────────────────────────────────────┐
│  allDocs → Bitmap(all IDs)         │
│                                    │
│  Used for NOT o
View on GitHub
GitHub Stars110
CategoryData
Updated5d ago
Forks3

Languages

Go

Security Score

100/100

Audited on Mar 22, 2026

No findings