νgrεp

Search code by meaning, not just keywords. 100% offline. Zero cloud dependencies.

</div>

Installation

curl -fsSL https://vgrep.dev/install.sh | sh

Or with wget:

wget -qO- https://vgrep.dev/install.sh | sh

After installation, initialize vgrep:

vgrep init
vgrep models download

Introduction

νgrεp is a semantic code search tool that uses local LLM embeddings to find code by intent rather than exact text matches. Unlike traditional grep which searches for literal strings, νgrεp understands the meaning behind your query and finds semantically related code across your entire codebase.

Quick Start: vgrep init && vgrep serve then vgrep "where is authentication handled?"

Key Features

Semantic Search: Find code by intent - search "error handling" to find try/catch blocks, Result types, and exception handlers
100% Local: All processing happens on your machine using llama.cpp - no API keys, no cloud, your code stays private
Server Mode: Keep models loaded in memory for instant sub-100ms searches
File Watcher: Automatically re-index files as they change
Cross-Platform: Native binaries for Windows, Linux, and macOS
GPU Acceleration: Optional CUDA, Metal, and Vulkan support for faster embeddings

System Overview

νgrεp uses a client-server architecture optimized for fast repeated searches:

┌─────────────────────────────────────────────────────────────────────────────┐
│                              USER QUERIES                                    │
│                        "where is auth handled?"                              │
│                        "database connection logic"                           │
│                        "error handling patterns"                             │
└─────────────────────────────────┬───────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            νgrεp CLIENT                                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │   Search    │  │    Index    │  │    Watch    │  │   Config    │        │
│  │  Command    │  │   Command   │  │   Command   │  │   Editor    │        │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘        │
└─────────────────────────────────┬───────────────────────────────────────────┘
                                  │ HTTP API
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            νgrεp SERVER                                      │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                    Embedding Engine (llama.cpp)                       │  │
│  │              Qwen3-Embedding-0.6B • Always Loaded • Fast              │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                     SQLite Vector Database                            │  │
│  │         File Hashes • Code Chunks • Embeddings • Metadata             │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

Processing Pipeline

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  Source  │───▶│  Chunk   │───▶│  Embed   │───▶│  Store   │───▶│  Search  │
│  Files   │    │  (512b)  │    │  (LLM)   │    │ (SQLite) │    │ (Cosine) │
└──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘
     │               │               │               │               │
     ▼               ▼               ▼               ▼               ▼
  .rs .py        Split into      Generate       Vector DB       Similarity
  .js .ts        overlapping     768-dim        with fast       ranking +
  .go .c         text chunks     vectors        retrieval       results

Installation

From Source

# Prerequisites: Rust 1.75+, LLVM/Clang, CMake
git clone https://github.com/CortexLM/vgrep.git
cd vgrep
cargo build --release

# Binary at target/release/vgrep

GPU Acceleration

cargo build --release --features cuda    # NVIDIA GPUs
cargo build --release --features metal   # Apple Silicon
cargo build --release --features vulkan  # Cross-platform GPU

System Requirements

| Component | Minimum | Recommended | |-----------|---------|-------------| | RAM | 2 GB | 4+ GB | | Disk | 1 GB (models) | 2+ GB | | CPU | 4 cores | 8+ cores | | GPU | Optional | CUDA/Metal for 10x speedup |

Quick Start

1. Initialize

# Download models and create config (~1GB download)
vgrep init
vgrep models download

2. Start Server

# Keep this running - loads model once for fast searches
vgrep serve

Output:

  >>> vgrep server
  Server: http://127.0.0.1:7777
  
  Loading embedding model...
  Model loaded successfully!
  
  Endpoints:
    • GET  /health   - Health check
    • GET  /status   - Index status
    • POST /search   - Semantic search
    • POST /embed    - Generate embeddings
  
  → Press Ctrl+C to stop

3. Index & Watch

# In another terminal - index and auto-update on changes
vgrep watch

Output:

  >>> vgrep watcher
  Path: /home/user/myproject
  Mode: server

  Ctrl+C to stop

──────────────────────────────────────────────────

  >> Initial indexing...
  Phase 1: Reading files...
    Read 45 files, 312 chunks
  Phase 2: Generating embeddings via server...
    Generated 312 embeddings
  Phase 3: Storing in database...
    Stored 45 files

  Indexing complete!
    Files: 45 indexed, 12 skipped
    Chunks: 312

──────────────────────────────────────────────────

  [~] Watching for changes...

  [+] indexed auth.rs
  [+] indexed db.rs

4. Search

# Semantic search - finds by meaning
vgrep "where is authentication handled?"
vgrep "database connection pooling"
vgrep "error handling for network requests"

Output:

  Searching for: where is authentication handled?

  1. ./src/auth/middleware.rs (87.3%)
  2. ./src/handlers/login.rs (82.1%)
  3. ./src/utils/jwt.rs (76.8%)
  4. ./src/config/security.rs (71.2%)

  → Found 4 results in 45ms

Commands

Search

| Command | Description | |---------|-------------| | vgrep "query" | Quick semantic search | | vgrep search "query" -m 20 | Search with max 20 results | | vgrep search "query" -c | Show code snippets in results | | vgrep search "query" --sync | Re-index before searching |

Server & Indexing

| Command | Description | |---------|-------------| | vgrep serve | Start server (keeps model loaded) | | vgrep serve -p 8080 | Custom port | | vgrep index | Manual one-time index | | vgrep index --force | Force re-index all files | | vgrep watch | Watch and auto-index on changes | | vgrep status | Show index statistics |

Configuration

| Command | Description | |---------|-------------| | vgrep config | Interactive configuration editor | | vgrep config show | Display all settings | | vgrep config set mode local | Set config value | | vgrep config reset | Reset to defaults |

Models

| Command | Description | |---------|-------------| | vgrep init | Initialize vgrep | | vgrep models download | Download embedding models | | vgrep models list | Show configured models |

Agent Integrations

νgrεp supports assisted installation for popular coding agents:

vgrep install <agent>     # Install integration
vgrep uninstall <agent>   # Remove integration

| Agent | Command | |-------|---------| | Claude Code | vgrep install claude-code | | OpenCode | vgrep install opencode | | Codex | vgrep install codex | | Factory Droid | vgrep install droid |

Usage with Claude Code

vgrep install claude-code
vgrep serve   # Start server
vgrep watch   # Index your project
# Claude Code can now use vgrep for semantic search

Usage with Factory Droid

vgrep install droid
# vgrep auto-starts when you begin a Droid session

To uninstall: vgrep uninstall <agent> (e.g., vgrep uninstall droid).

How It Works

Embedding Generation

νgrεp converts code into high-dimensional vectors that capture semantic meaning:

Input:  "fn authenticate(user: &str, pass: &str) -> Result<Token>"
        ↓
        Tokenize → Qwen3-Embedding → Normalize
        ↓
Output: [0.023, -0.156, 0.891, ..., 0.045]  (768 dimensions)

Similarity Search

Queries are embedded and compared using cosine similarity:

$$\text{similarity}(q, d) = \frac{q \cdot d}{|q| |d|} = \frac{\sum_{i=1}^{n} q_i d_i}{\sqrt{\sum_{i=1}^{n} q_i^2} \sqrt{\sum_{i=1}^{n} d_i

Vgrep

Install / Use

README

νgrεp

Installation

Introduction

Key Features

System Overview

Processing Pipeline

Installation

From Source

GPU Acceleration

System Requirements

Quick Start

1. Initialize

2. Start Server

3. Index & Watch

4. Search

Commands

Search

Server & Indexing

Configuration

Models

Agent Integrations

Usage with Claude Code

Usage with Factory Droid

How It Works

Embedding Generation

Similarity Search