Vgrep
[π§¬] vgrep: a privacy-first, fully local semantic search engine that uses vector embeddings to understand meaning, not just keywords. It runs entirely on your machine, indexes your data locally, and lets you search code, documents, or text by semantic similarity β fast, offline, and without sending anything to external services.
Install / Use
/learn @CortexLM/VgrepREADME
Ξ½grΞ΅p
<pre> βββ βββ βββββββ βββββββ βββββββββββββββ βββ βββββββββββ ββββββββββββββββββββββββ βββ ββββββ ββββββββββββββββββ ββββββββ ββββ βββββββ βββββββββββββββββ βββββββ βββββββ ββββββββββββ ββββββββββββββ βββββ βββββββ βββ ββββββββββββββ </pre>Search code by meaning, not just keywords. 100% offline. Zero cloud dependencies.
</div>Installation
curl -fsSL https://vgrep.dev/install.sh | sh
Or with wget:
wget -qO- https://vgrep.dev/install.sh | sh
After installation, initialize vgrep:
vgrep init
vgrep models download
Introduction
Ξ½grΞ΅p is a semantic code search tool that uses local LLM embeddings to find code by intent rather than exact text matches. Unlike traditional grep which searches for literal strings, Ξ½grΞ΅p understands the meaning behind your query and finds semantically related code across your entire codebase.
Quick Start:
vgrep init && vgrep servethenvgrep "where is authentication handled?"
Key Features
- Semantic Search: Find code by intent - search "error handling" to find try/catch blocks, Result types, and exception handlers
- 100% Local: All processing happens on your machine using llama.cpp - no API keys, no cloud, your code stays private
- Server Mode: Keep models loaded in memory for instant sub-100ms searches
- File Watcher: Automatically re-index files as they change
- Cross-Platform: Native binaries for Windows, Linux, and macOS
- GPU Acceleration: Optional CUDA, Metal, and Vulkan support for faster embeddings
System Overview
Ξ½grΞ΅p uses a client-server architecture optimized for fast repeated searches:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER QUERIES β
β "where is auth handled?" β
β "database connection logic" β
β "error handling patterns" β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Ξ½grΞ΅p CLIENT β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Search β β Index β β Watch β β Config β β
β β Command β β Command β β Command β β Editor β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β HTTP API
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Ξ½grΞ΅p SERVER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Embedding Engine (llama.cpp) β β
β β Qwen3-Embedding-0.6B β’ Always Loaded β’ Fast β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SQLite Vector Database β β
β β File Hashes β’ Code Chunks β’ Embeddings β’ Metadata β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Processing Pipeline
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β Source βββββΆβ Chunk βββββΆβ Embed βββββΆβ Store βββββΆβ Search β
β Files β β (512b) β β (LLM) β β (SQLite) β β (Cosine) β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β β β β β
βΌ βΌ βΌ βΌ βΌ
.rs .py Split into Generate Vector DB Similarity
.js .ts overlapping 768-dim with fast ranking +
.go .c text chunks vectors retrieval results
Installation
From Source
# Prerequisites: Rust 1.75+, LLVM/Clang, CMake
git clone https://github.com/CortexLM/vgrep.git
cd vgrep
cargo build --release
# Binary at target/release/vgrep
GPU Acceleration
cargo build --release --features cuda # NVIDIA GPUs
cargo build --release --features metal # Apple Silicon
cargo build --release --features vulkan # Cross-platform GPU
System Requirements
| Component | Minimum | Recommended | |-----------|---------|-------------| | RAM | 2 GB | 4+ GB | | Disk | 1 GB (models) | 2+ GB | | CPU | 4 cores | 8+ cores | | GPU | Optional | CUDA/Metal for 10x speedup |
Quick Start
1. Initialize
# Download models and create config (~1GB download)
vgrep init
vgrep models download
2. Start Server
# Keep this running - loads model once for fast searches
vgrep serve
Output:
>>> vgrep server
Server: http://127.0.0.1:7777
Loading embedding model...
Model loaded successfully!
Endpoints:
β’ GET /health - Health check
β’ GET /status - Index status
β’ POST /search - Semantic search
β’ POST /embed - Generate embeddings
β Press Ctrl+C to stop
3. Index & Watch
# In another terminal - index and auto-update on changes
vgrep watch
Output:
>>> vgrep watcher
Path: /home/user/myproject
Mode: server
Ctrl+C to stop
ββββββββββββββββββββββββββββββββββββββββββββββββββ
>> Initial indexing...
Phase 1: Reading files...
Read 45 files, 312 chunks
Phase 2: Generating embeddings via server...
Generated 312 embeddings
Phase 3: Storing in database...
Stored 45 files
Indexing complete!
Files: 45 indexed, 12 skipped
Chunks: 312
ββββββββββββββββββββββββββββββββββββββββββββββββββ
[~] Watching for changes...
[+] indexed auth.rs
[+] indexed db.rs
4. Search
# Semantic search - finds by meaning
vgrep "where is authentication handled?"
vgrep "database connection pooling"
vgrep "error handling for network requests"
Output:
Searching for: where is authentication handled?
1. ./src/auth/middleware.rs (87.3%)
2. ./src/handlers/login.rs (82.1%)
3. ./src/utils/jwt.rs (76.8%)
4. ./src/config/security.rs (71.2%)
β Found 4 results in 45ms
Commands
Search
| Command | Description |
|---------|-------------|
| vgrep "query" | Quick semantic search |
| vgrep search "query" -m 20 | Search with max 20 results |
| vgrep search "query" -c | Show code snippets in results |
| vgrep search "query" --sync | Re-index before searching |
Server & Indexing
| Command | Description |
|---------|-------------|
| vgrep serve | Start server (keeps model loaded) |
| vgrep serve -p 8080 | Custom port |
| vgrep index | Manual one-time index |
| vgrep index --force | Force re-index all files |
| vgrep watch | Watch and auto-index on changes |
| vgrep status | Show index statistics |
Configuration
| Command | Description |
|---------|-------------|
| vgrep config | Interactive configuration editor |
| vgrep config show | Display all settings |
| vgrep config set mode local | Set config value |
| vgrep config reset | Reset to defaults |
Models
| Command | Description |
|---------|-------------|
| vgrep init | Initialize vgrep |
| vgrep models download | Download embedding models |
| vgrep models list | Show configured models |
Agent Integrations
Ξ½grΞ΅p supports assisted installation for popular coding agents:
vgrep install <agent> # Install integration
vgrep uninstall <agent> # Remove integration
| Agent | Command |
|-------|---------|
| Claude Code | vgrep install claude-code |
| OpenCode | vgrep install opencode |
| Codex | vgrep install codex |
| Factory Droid | vgrep install droid |
Usage with Claude Code
vgrep install claude-code
vgrep serve # Start server
vgrep watch # Index your project
# Claude Code can now use vgrep for semantic search
Usage with Factory Droid
vgrep install droid
# vgrep auto-starts when you begin a Droid session
To uninstall: vgrep uninstall <agent> (e.g., vgrep uninstall droid).
How It Works
Embedding Generation
Ξ½grΞ΅p converts code into high-dimensional vectors that capture semantic meaning:
Input: "fn authenticate(user: &str, pass: &str) -> Result<Token>"
β
Tokenize β Qwen3-Embedding β Normalize
β
Output: [0.023, -0.156, 0.891, ..., 0.045] (768 dimensions)
Similarity Search
Queries are embedded and compared using cosine similarity:
$$\text{similarity}(q, d) = \frac{q \cdot d}{|q| |d|} = \frac{\sum_{i=1}^{n} q_i d_i}{\sqrt{\sum_{i=1}^{n} q_i^2} \sqrt{\sum_{i=1}^{n} d_i
