SkillAgentSearch skills...

Srclight

Deep code indexing MCP server for AI agents. 25 tools: hybrid FTS5 + embedding search, call graphs, git blame/hotspots, build system analysis. Multi-repo workspaces, GPU-accelerated semantic search, 10 languages via tree-sitter. Fully local, zero cloud dependencies.

Install / Use

/learn @srclight/Srclight

README

<!-- mcp-name: io.github.srclight/srclight -->

Srclight

PyPI License Python

Deep code indexing for AI agents. SQLite FTS5 + tree-sitter + embeddings + MCP.

Srclight builds a rich, searchable index of your codebase that AI coding agents can query instantly — replacing dozens of grep/glob calls with precise, structured lookups. It is the most comprehensive code intelligence MCP server available: 29 tools covering symbol search, relationship graphs, git change intelligence, semantic search, build system awareness, and document extraction — capabilities no other single MCP server combines. Fully local and private: your code never leaves your machine.

Why?

AI coding agents (Claude Code, Cursor, etc.) spend 40-60% of their tokens on orientation — searching for files, reading code to understand structure, hunting for callers and callees. Srclight eliminates this waste.

| Without Srclight | With Srclight | |---|---| | 8-12 grep rounds to find callers | get_callers("lookup") — one call | | Read 5 files to understand module | codebase_map() — instant overview | | "Find code that does X" → 20 greps | semantic_search("dictionary lookup") — one call | | 15-25 tool calls per bug fix | 5-8 tool calls per bug fix |

Features

  • Minimal dependencies — single SQLite file per repo, no Docker/Redis/vector DB
  • Fully offline — no API calls, works air-gapped (Ollama local embeddings)
  • Incremental — only re-indexes changed files (content hash detection)
  • 11 languages — Python, C, C++, C#, JavaScript, TypeScript, PHP, Dart, Swift, Kotlin, Java, Go
  • 10 document formats — PDF, DOCX, XLSX, HTML, CSV/TSV, email (.eml), images (PNG/JPG/SVG/etc.), plain text, RST, Markdown
  • Optional OCR — PaddleOCR for scanned/image-only PDF pages; pytesseract for images
  • 4 search modes — symbol names, source code (trigram), documentation (stemmed), semantic (embeddings)
  • Hybrid search — RRF fusion of keyword + semantic results for best accuracy
  • Multi-repo workspaces — search across all your repos simultaneously via SQLite ATTACH+UNION
  • MCP server — works with Claude Code, Cursor, and any MCP client
  • CLI — index, search, and inspect from the terminal
  • Auto-reindex — git post-commit/post-checkout hooks keep indexes fresh

Requirements

  • Python 3.11+
  • Git (for change intelligence and auto-reindex hooks)
  • Ollama (optional, for semantic search / embeddings) — ollama.com
  • NVIDIA GPU + cupy (optional, for GPU-accelerated vector search)
  • Poppler (optional, for PaddleOCR scanned-PDF support) — apt install poppler-utils / brew install poppler

Quick Start

# Install from PyPI
pip install srclight

# Install from source
git clone https://github.com/srclight/srclight.git
cd srclight
pip install -e .

# Optional: document format support (PDF, DOCX, XLSX, HTML, images)
pip install 'srclight[docs,pdf]'

# Optional: OCR for scanned PDFs (also needs poppler-utils on your system)
pip install 'srclight[pdf,paddleocr]'

# Optional: OCR for images (needs tesseract on your system)
pip install 'srclight[docs,ocr]'

# Optional: GPU-accelerated vector search (requires CUDA 12.x)
pip install 'srclight[gpu]'

# Everything (docs + pdf + ocr + paddleocr + gpu)
pip install 'srclight[all]'

# Index your project
cd /path/to/your/project
srclight index

# Index with embeddings (requires Ollama running)
srclight index --embed qwen3-embedding

# Search
srclight search "lookup"
srclight search --kind function "parse"
srclight symbols src/main.py

# Start MCP server (for Claude Code / Cursor)
srclight serve

Note: srclight index automatically adds .srclight/ to your .gitignore. Index databases and embedding files can be large and should never be committed.

Semantic Search (Embeddings)

Srclight supports embedding-based semantic search for natural language queries like "find code that handles authentication" or "where is the database connection pool".

Setup

# Install Ollama (https://ollama.com)
# Pull an embedding model
ollama pull qwen3-embedding       # Best quality (8B params, needs ~6GB VRAM)
ollama pull nomic-embed-text      # Lighter alternative (137M params)

# Index with embeddings
srclight index --embed qwen3-embedding

# Or index workspace with embeddings
srclight workspace index -w myworkspace --embed qwen3-embedding

How It Works

  1. Each symbol's name + signature + docstring + content is embedded as a float vector
  2. Vectors are stored as BLOBs in symbol_embeddings table (SQLite)
  3. After indexing, a .npy sidecar snapshot is built and loaded to GPU VRAM (cupy) or CPU RAM (numpy) for fast search
  4. semantic_search(query) embeds the query and runs cosine similarity against the GPU-resident matrix (~3ms for 27K vectors on a modern GPU)
  5. hybrid_search(query) combines FTS5 keyword results + embedding results via Reciprocal Rank Fusion (RRF)

Embedding Providers

| Provider | Model | Quality | Local? | Notes | |----------|-------|---------|--------|-------| | Ollama (default) | qwen3-embedding | Best local | Yes | Needs ~6GB VRAM | | Ollama | nomic-embed-text | Good | Yes | Lighter, works on 8GB VRAM | | Voyage AI (API) | voyage-code-3 | Best overall | No | Requires VOYAGE_API_KEY |

# Use Voyage Code 3 (API, highest quality)
VOYAGE_API_KEY=your-key srclight index --embed voyage-code-3

Storage

Embeddings are stored in symbol_embeddings table in .srclight/index.db. After indexing, a .npy sidecar snapshot is built for fast GPU loading:

| File | Purpose | |------|---------| | index.db | Write path — per-symbol CRUD during indexing | | embeddings.npy | Read path — contiguous float32 matrix for GPU/CPU search | | embeddings_norms.npy | Pre-computed row norms (avoids recomputation per query) | | embeddings_meta.json | Symbol ID mapping, model info, version for cache invalidation |

For ~27K symbols at 4096 dims (qwen3-embedding), that's ~428 MB on disk, ~450 MB in VRAM. Incremental: only re-embeds symbols whose content changed; sidecar rebuilt after each indexing run.

Multi-Repo Workspaces

Search across multiple repos simultaneously. Each repo keeps its own .srclight/index.db; at query time, srclight ATTACHes them all and UNIONs across schemas.

# Create a workspace
srclight workspace init myworkspace

# Add repos
srclight workspace add /path/to/repo1 -w myworkspace
srclight workspace add /path/to/repo2 -w myworkspace -n custom-name

# Index all repos (with optional embeddings)
srclight workspace index -w myworkspace
srclight workspace index -w myworkspace --embed qwen3-embedding

# Search across all repos
srclight workspace search "Dictionary" -w myworkspace
srclight workspace search "Dictionary" -w myworkspace --project repo1

# Status
srclight workspace status -w myworkspace
srclight workspace list

# Start MCP server in workspace mode
srclight serve --workspace myworkspace

Git submodules are not indexed automatically — git ls-files does not recurse into them. To index a submodule, clone it separately and add it as its own workspace project. See docs/usage-guide.md for details.

MCP Integration

Srclight supports two transport modes: stdio (one server per session) and SSE (persistent server, multiple sessions). SSE is recommended for workspaces.

Claude Code

Stdio (simplest — one server per session):

# Single repo
claude mcp add srclight -- srclight serve

# Workspace mode
claude mcp add srclight -- srclight serve --workspace myworkspace

# Make it available in all projects (user scope)
claude mcp add --scope user srclight -- srclight serve --workspace myworkspace

SSE (persistent server — recommended for workspaces):

Run srclight as a long-lived server, then point Claude Code at it:

# Start the server (default: http://127.0.0.1:8742/sse)
srclight serve --workspace myworkspace &

# Or install as a systemd user service (Linux/WSL)
# See docs/usage-guide.md for the service file

# Connect Claude Code to the running server
claude mcp add --transport sse srclight http://127.0.0.1:8742/sse

SSE mode supports multiple concurrent sessions and survives Claude Code restarts.

Cursor

SSE (recommended): Run srclight once, then connect Cursor to it. Best for responsiveness and no cold-start per session.

Start the server: srclight serve --workspace myworkspace (default SSE on port 8742).

  • UI: Settings → Tools & MCP → Add new MCP server → Type: streamableHttp, URL: http://127.0.0.1:8742/sse.
  • JSON (project .cursor/mcp.json or global ~/.cursor/mcp.json):
"srclight": {
  "url": "http://127.0.0.1:8742/sse"
}

Stdio (alternative): One server process per Cursor session.

  • UI: Type: command, Command: srclight, Args: serve --workspace myworkspace (or serve for single-repo).
  • JSON:
"srclight": {
  "command": "srclight",
  "args": ["serve", "--workspace", "myworkspace"]
}

For single-repo: "args": ["serve"]. Restart Cursor completely after adding the server.

Verify: In Cursor chat, ask "What projects are in the srclight workspace?" or "List srclight tools" — the agent should call list_projects() or show srclight tools.

OpenClaw

OpenClaw connects to srclight via mcporter, its built-in MCP tool server CLI.

# 1. Add srclight to mcporter's home config
mcporter config add srclight http://127.0.0.1:8742/sse \
  --transport sse --scope home \
  --description "Srclight deep code indexing"

# 2. Verify the connection
mcporter call srclight.

Related Skills

View on GitHub
GitHub Stars20
CategoryDevelopment
Updated1d ago
Forks7

Languages

Python

Security Score

95/100

Audited on Mar 20, 2026

No findings