SkillAgentSearch skills...

Vgrep

[🧬] vgrep: a privacy-first, fully local semantic search engine that uses vector embeddings to understand meaning, not just keywords. It runs entirely on your machine, indexes your data locally, and lets you search code, documents, or text by semantic similarity β€” fast, offline, and without sending anything to external services.

Install / Use

/learn @CortexLM/Vgrep
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center">

Ξ½grΞ΅p

<pre> β–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β• β–ˆβ–ˆβ•”β•β•β•β• β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β•šβ•β•β•β• β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β• </pre>

Search code by meaning, not just keywords. 100% offline. Zero cloud dependencies.

CI License GitHub stars Rust Discord

</div>

Installation

curl -fsSL https://vgrep.dev/install.sh | sh

Or with wget:

wget -qO- https://vgrep.dev/install.sh | sh

After installation, initialize vgrep:

vgrep init
vgrep models download

Introduction

Ξ½grΞ΅p is a semantic code search tool that uses local LLM embeddings to find code by intent rather than exact text matches. Unlike traditional grep which searches for literal strings, Ξ½grΞ΅p understands the meaning behind your query and finds semantically related code across your entire codebase.

Quick Start: vgrep init && vgrep serve then vgrep "where is authentication handled?"

Key Features

  • Semantic Search: Find code by intent - search "error handling" to find try/catch blocks, Result types, and exception handlers
  • 100% Local: All processing happens on your machine using llama.cpp - no API keys, no cloud, your code stays private
  • Server Mode: Keep models loaded in memory for instant sub-100ms searches
  • File Watcher: Automatically re-index files as they change
  • Cross-Platform: Native binaries for Windows, Linux, and macOS
  • GPU Acceleration: Optional CUDA, Metal, and Vulkan support for faster embeddings

System Overview

Ξ½grΞ΅p uses a client-server architecture optimized for fast repeated searches:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              USER QUERIES                                    β”‚
β”‚                        "where is auth handled?"                              β”‚
β”‚                        "database connection logic"                           β”‚
β”‚                        "error handling patterns"                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                            Ξ½grΞ΅p CLIENT                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚   Search    β”‚  β”‚    Index    β”‚  β”‚    Watch    β”‚  β”‚   Config    β”‚        β”‚
β”‚  β”‚  Command    β”‚  β”‚   Command   β”‚  β”‚   Command   β”‚  β”‚   Editor    β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚ HTTP API
                                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                            Ξ½grΞ΅p SERVER                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    Embedding Engine (llama.cpp)                       β”‚  β”‚
β”‚  β”‚              Qwen3-Embedding-0.6B β€’ Always Loaded β€’ Fast              β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                     SQLite Vector Database                            β”‚  β”‚
β”‚  β”‚         File Hashes β€’ Code Chunks β€’ Embeddings β€’ Metadata             β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Processing Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Source  │───▢│  Chunk   │───▢│  Embed   │───▢│  Store   │───▢│  Search  β”‚
β”‚  Files   β”‚    β”‚  (512b)  β”‚    β”‚  (LLM)   β”‚    β”‚ (SQLite) β”‚    β”‚ (Cosine) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚               β”‚               β”‚               β”‚               β”‚
     β–Ό               β–Ό               β–Ό               β–Ό               β–Ό
  .rs .py        Split into      Generate       Vector DB       Similarity
  .js .ts        overlapping     768-dim        with fast       ranking +
  .go .c         text chunks     vectors        retrieval       results

Installation

From Source

# Prerequisites: Rust 1.75+, LLVM/Clang, CMake
git clone https://github.com/CortexLM/vgrep.git
cd vgrep
cargo build --release

# Binary at target/release/vgrep

GPU Acceleration

cargo build --release --features cuda    # NVIDIA GPUs
cargo build --release --features metal   # Apple Silicon
cargo build --release --features vulkan  # Cross-platform GPU

System Requirements

| Component | Minimum | Recommended | |-----------|---------|-------------| | RAM | 2 GB | 4+ GB | | Disk | 1 GB (models) | 2+ GB | | CPU | 4 cores | 8+ cores | | GPU | Optional | CUDA/Metal for 10x speedup |


Quick Start

1. Initialize

# Download models and create config (~1GB download)
vgrep init
vgrep models download

2. Start Server

# Keep this running - loads model once for fast searches
vgrep serve

Output:

  >>> vgrep server
  Server: http://127.0.0.1:7777
  
  Loading embedding model...
  Model loaded successfully!
  
  Endpoints:
    β€’ GET  /health   - Health check
    β€’ GET  /status   - Index status
    β€’ POST /search   - Semantic search
    β€’ POST /embed    - Generate embeddings
  
  β†’ Press Ctrl+C to stop

3. Index & Watch

# In another terminal - index and auto-update on changes
vgrep watch

Output:

  >>> vgrep watcher
  Path: /home/user/myproject
  Mode: server

  Ctrl+C to stop

──────────────────────────────────────────────────

  >> Initial indexing...
  Phase 1: Reading files...
    Read 45 files, 312 chunks
  Phase 2: Generating embeddings via server...
    Generated 312 embeddings
  Phase 3: Storing in database...
    Stored 45 files

  Indexing complete!
    Files: 45 indexed, 12 skipped
    Chunks: 312

──────────────────────────────────────────────────

  [~] Watching for changes...

  [+] indexed auth.rs
  [+] indexed db.rs

4. Search

# Semantic search - finds by meaning
vgrep "where is authentication handled?"
vgrep "database connection pooling"
vgrep "error handling for network requests"

Output:

  Searching for: where is authentication handled?

  1. ./src/auth/middleware.rs (87.3%)
  2. ./src/handlers/login.rs (82.1%)
  3. ./src/utils/jwt.rs (76.8%)
  4. ./src/config/security.rs (71.2%)

  β†’ Found 4 results in 45ms

Commands

Search

| Command | Description | |---------|-------------| | vgrep "query" | Quick semantic search | | vgrep search "query" -m 20 | Search with max 20 results | | vgrep search "query" -c | Show code snippets in results | | vgrep search "query" --sync | Re-index before searching |

Server & Indexing

| Command | Description | |---------|-------------| | vgrep serve | Start server (keeps model loaded) | | vgrep serve -p 8080 | Custom port | | vgrep index | Manual one-time index | | vgrep index --force | Force re-index all files | | vgrep watch | Watch and auto-index on changes | | vgrep status | Show index statistics |

Configuration

| Command | Description | |---------|-------------| | vgrep config | Interactive configuration editor | | vgrep config show | Display all settings | | vgrep config set mode local | Set config value | | vgrep config reset | Reset to defaults |

Models

| Command | Description | |---------|-------------| | vgrep init | Initialize vgrep | | vgrep models download | Download embedding models | | vgrep models list | Show configured models |


Agent Integrations

Ξ½grΞ΅p supports assisted installation for popular coding agents:

vgrep install <agent>     # Install integration
vgrep uninstall <agent>   # Remove integration

| Agent | Command | |-------|---------| | Claude Code | vgrep install claude-code | | OpenCode | vgrep install opencode | | Codex | vgrep install codex | | Factory Droid | vgrep install droid |

Usage with Claude Code

vgrep install claude-code
vgrep serve   # Start server
vgrep watch   # Index your project
# Claude Code can now use vgrep for semantic search

Usage with Factory Droid

vgrep install droid
# vgrep auto-starts when you begin a Droid session

To uninstall: vgrep uninstall <agent> (e.g., vgrep uninstall droid).


How It Works

Embedding Generation

Ξ½grΞ΅p converts code into high-dimensional vectors that capture semantic meaning:

Input:  "fn authenticate(user: &str, pass: &str) -> Result<Token>"
        ↓
        Tokenize β†’ Qwen3-Embedding β†’ Normalize
        ↓
Output: [0.023, -0.156, 0.891, ..., 0.045]  (768 dimensions)

Similarity Search

Queries are embedded and compared using cosine similarity:

$$\text{similarity}(q, d) = \frac{q \cdot d}{|q| |d|} = \frac{\sum_{i=1}^{n} q_i d_i}{\sqrt{\sum_{i=1}^{n} q_i^2} \sqrt{\sum_{i=1}^{n} d_i

View on GitHub
GitHub Stars128
CategoryDevelopment
Updated1d ago
Forks3

Languages

Rust

Security Score

95/100

Audited on Mar 26, 2026

No findings