Srclight
Deep code indexing MCP server for AI agents. 25 tools: hybrid FTS5 + embedding search, call graphs, git blame/hotspots, build system analysis. Multi-repo workspaces, GPU-accelerated semantic search, 10 languages via tree-sitter. Fully local, zero cloud dependencies.
Install / Use
/learn @srclight/SrclightQuality Score
Category
Development & EngineeringSupported Platforms
README
Srclight
Deep code indexing for AI agents. SQLite FTS5 + tree-sitter + embeddings + MCP.
Srclight builds a rich, searchable index of your codebase that AI coding agents can query instantly — replacing dozens of grep/glob calls with precise, structured lookups. It is the most comprehensive code intelligence MCP server available: 29 tools covering symbol search, relationship graphs, git change intelligence, semantic search, build system awareness, and document extraction — capabilities no other single MCP server combines. Fully local and private: your code never leaves your machine.
Why?
AI coding agents (Claude Code, Cursor, etc.) spend 40-60% of their tokens on orientation — searching for files, reading code to understand structure, hunting for callers and callees. Srclight eliminates this waste.
| Without Srclight | With Srclight |
|---|---|
| 8-12 grep rounds to find callers | get_callers("lookup") — one call |
| Read 5 files to understand module | codebase_map() — instant overview |
| "Find code that does X" → 20 greps | semantic_search("dictionary lookup") — one call |
| 15-25 tool calls per bug fix | 5-8 tool calls per bug fix |
Features
- Minimal dependencies — single SQLite file per repo, no Docker/Redis/vector DB
- Fully offline — no API calls, works air-gapped (Ollama local embeddings)
- Incremental — only re-indexes changed files (content hash detection)
- 11 languages — Python, C, C++, C#, JavaScript, TypeScript, PHP, Dart, Swift, Kotlin, Java, Go
- 10 document formats — PDF, DOCX, XLSX, HTML, CSV/TSV, email (.eml), images (PNG/JPG/SVG/etc.), plain text, RST, Markdown
- Optional OCR — PaddleOCR for scanned/image-only PDF pages; pytesseract for images
- 4 search modes — symbol names, source code (trigram), documentation (stemmed), semantic (embeddings)
- Hybrid search — RRF fusion of keyword + semantic results for best accuracy
- Multi-repo workspaces — search across all your repos simultaneously via SQLite ATTACH+UNION
- MCP server — works with Claude Code, Cursor, and any MCP client
- CLI — index, search, and inspect from the terminal
- Auto-reindex — git post-commit/post-checkout hooks keep indexes fresh
Requirements
- Python 3.11+
- Git (for change intelligence and auto-reindex hooks)
- Ollama (optional, for semantic search / embeddings) — ollama.com
- NVIDIA GPU + cupy (optional, for GPU-accelerated vector search)
- Poppler (optional, for PaddleOCR scanned-PDF support) —
apt install poppler-utils/brew install poppler
Quick Start
# Install from PyPI
pip install srclight
# Install from source
git clone https://github.com/srclight/srclight.git
cd srclight
pip install -e .
# Optional: document format support (PDF, DOCX, XLSX, HTML, images)
pip install 'srclight[docs,pdf]'
# Optional: OCR for scanned PDFs (also needs poppler-utils on your system)
pip install 'srclight[pdf,paddleocr]'
# Optional: OCR for images (needs tesseract on your system)
pip install 'srclight[docs,ocr]'
# Optional: GPU-accelerated vector search (requires CUDA 12.x)
pip install 'srclight[gpu]'
# Everything (docs + pdf + ocr + paddleocr + gpu)
pip install 'srclight[all]'
# Index your project
cd /path/to/your/project
srclight index
# Index with embeddings (requires Ollama running)
srclight index --embed qwen3-embedding
# Search
srclight search "lookup"
srclight search --kind function "parse"
srclight symbols src/main.py
# Start MCP server (for Claude Code / Cursor)
srclight serve
Note:
srclight indexautomatically adds.srclight/to your.gitignore. Index databases and embedding files can be large and should never be committed.
Semantic Search (Embeddings)
Srclight supports embedding-based semantic search for natural language queries like "find code that handles authentication" or "where is the database connection pool".
Setup
# Install Ollama (https://ollama.com)
# Pull an embedding model
ollama pull qwen3-embedding # Best quality (8B params, needs ~6GB VRAM)
ollama pull nomic-embed-text # Lighter alternative (137M params)
# Index with embeddings
srclight index --embed qwen3-embedding
# Or index workspace with embeddings
srclight workspace index -w myworkspace --embed qwen3-embedding
How It Works
- Each symbol's name + signature + docstring + content is embedded as a float vector
- Vectors are stored as BLOBs in
symbol_embeddingstable (SQLite) - After indexing, a
.npysidecar snapshot is built and loaded to GPU VRAM (cupy) or CPU RAM (numpy) for fast search semantic_search(query)embeds the query and runs cosine similarity against the GPU-resident matrix (~3ms for 27K vectors on a modern GPU)hybrid_search(query)combines FTS5 keyword results + embedding results via Reciprocal Rank Fusion (RRF)
Embedding Providers
| Provider | Model | Quality | Local? | Notes |
|----------|-------|---------|--------|-------|
| Ollama (default) | qwen3-embedding | Best local | Yes | Needs ~6GB VRAM |
| Ollama | nomic-embed-text | Good | Yes | Lighter, works on 8GB VRAM |
| Voyage AI (API) | voyage-code-3 | Best overall | No | Requires VOYAGE_API_KEY |
# Use Voyage Code 3 (API, highest quality)
VOYAGE_API_KEY=your-key srclight index --embed voyage-code-3
Storage
Embeddings are stored in symbol_embeddings table in .srclight/index.db. After indexing, a .npy sidecar snapshot is built for fast GPU loading:
| File | Purpose |
|------|---------|
| index.db | Write path — per-symbol CRUD during indexing |
| embeddings.npy | Read path — contiguous float32 matrix for GPU/CPU search |
| embeddings_norms.npy | Pre-computed row norms (avoids recomputation per query) |
| embeddings_meta.json | Symbol ID mapping, model info, version for cache invalidation |
For ~27K symbols at 4096 dims (qwen3-embedding), that's ~428 MB on disk, ~450 MB in VRAM. Incremental: only re-embeds symbols whose content changed; sidecar rebuilt after each indexing run.
Multi-Repo Workspaces
Search across multiple repos simultaneously. Each repo keeps its own .srclight/index.db; at query time, srclight ATTACHes them all and UNIONs across schemas.
# Create a workspace
srclight workspace init myworkspace
# Add repos
srclight workspace add /path/to/repo1 -w myworkspace
srclight workspace add /path/to/repo2 -w myworkspace -n custom-name
# Index all repos (with optional embeddings)
srclight workspace index -w myworkspace
srclight workspace index -w myworkspace --embed qwen3-embedding
# Search across all repos
srclight workspace search "Dictionary" -w myworkspace
srclight workspace search "Dictionary" -w myworkspace --project repo1
# Status
srclight workspace status -w myworkspace
srclight workspace list
# Start MCP server in workspace mode
srclight serve --workspace myworkspace
Git submodules are not indexed automatically — git ls-files does not recurse into them. To index a submodule, clone it separately and add it as its own workspace project. See docs/usage-guide.md for details.
MCP Integration
Srclight supports two transport modes: stdio (one server per session) and SSE (persistent server, multiple sessions). SSE is recommended for workspaces.
Claude Code
Stdio (simplest — one server per session):
# Single repo
claude mcp add srclight -- srclight serve
# Workspace mode
claude mcp add srclight -- srclight serve --workspace myworkspace
# Make it available in all projects (user scope)
claude mcp add --scope user srclight -- srclight serve --workspace myworkspace
SSE (persistent server — recommended for workspaces):
Run srclight as a long-lived server, then point Claude Code at it:
# Start the server (default: http://127.0.0.1:8742/sse)
srclight serve --workspace myworkspace &
# Or install as a systemd user service (Linux/WSL)
# See docs/usage-guide.md for the service file
# Connect Claude Code to the running server
claude mcp add --transport sse srclight http://127.0.0.1:8742/sse
SSE mode supports multiple concurrent sessions and survives Claude Code restarts.
Cursor
SSE (recommended): Run srclight once, then connect Cursor to it. Best for responsiveness and no cold-start per session.
Start the server: srclight serve --workspace myworkspace (default SSE on port 8742).
- UI: Settings → Tools & MCP → Add new MCP server → Type:
streamableHttp, URL:http://127.0.0.1:8742/sse. - JSON (project
.cursor/mcp.jsonor global~/.cursor/mcp.json):
"srclight": {
"url": "http://127.0.0.1:8742/sse"
}
Stdio (alternative): One server process per Cursor session.
- UI: Type:
command, Command:srclight, Args:serve --workspace myworkspace(orservefor single-repo). - JSON:
"srclight": {
"command": "srclight",
"args": ["serve", "--workspace", "myworkspace"]
}
For single-repo: "args": ["serve"]. Restart Cursor completely after adding the server.
Verify: In Cursor chat, ask "What projects are in the srclight workspace?" or "List srclight tools" — the agent should call list_projects() or show srclight tools.
OpenClaw
OpenClaw connects to srclight via mcporter, its built-in MCP tool server CLI.
# 1. Add srclight to mcporter's home config
mcporter config add srclight http://127.0.0.1:8742/sse \
--transport sse --scope home \
--description "Srclight deep code indexing"
# 2. Verify the connection
mcporter call srclight.
Related Skills
node-connect
328.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
openai-image-gen
328.6kBatch-generate images via OpenAI Images API. Random prompt sampler + `index.html` gallery.
claude-opus-4-5-migration
80.9kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
80.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
