Codescan
Semantic code search and targeted edits for local repositories, all done locally (no cloud).
Install / Use
/learn @pmarreck/CodescanREADME
codescan
Semantic code search for local repositories.
- Zig CLI + HTTP API + MCP server
- Ollama embeddings (default:
bge-large, override withOLLAMA_MODEL) - sqlite-vec vector storage
- Hybrid search (vector + lexical)
- Symbol extraction: Zig, C/C++, TypeScript/JavaScript, Rust, Elixir, Bash, Lua, Nix, Nim, Lean, Idris, Haskell, Go, Ruby, Erlang, OCaml, Swift, LLVM IR, Clojure, Assembly
- LSP (references, rename): all of the above
- Markdown/text/log indexing with semantic chunking
Install
With Nix (recommended)
# Run directly without installing
nix run github:pmarreck/codescan -- search "your query"
# Install to your profile
nix profile install github:pmarreck/codescan
# For faster downloads, add the garnix binary cache to /etc/nix/nix.conf:
# extra-substituters = https://cache.garnix.io
# extra-trusted-public-keys = cache.garnix.io:CTFPyKSLcx5RMJKfLo5EEPUObbA78b0YQ2DTCJXqr9g=
Pre-built binaries (no Nix required)
Pre-built binaries for Linux (x86_64, arm64) and macOS (arm64) are available as artifacts from the latest CI build:
- Click the most recent successful run
- Scroll to the Artifacts section at the bottom
- Download the archive for your platform
- Extract and place
codescansomewhere on yourPATH
Note: GitHub requires you to be signed in to download workflow artifacts.
Build from source
nix develop -c zig build -Doptimize=ReleaseFast
Test
./test
CLI/HTTP tests
nix develop -c ./tests/cli/test-cli
nix develop -c ./tests/http/test-http
Integration test
# requires Ollama running with bge-large pulled (or set OLLAMA_MODEL)
nix develop -c ./tests/integration/test-integration
CI (local, Linux only)
# requires act (https://github.com/nektos/act)
./scripts/ci-local
Run (CLI)
# show or edit project config
codescan config
codescan config edit
# ReleaseFast builds are self-contained; no `nix develop` prefix needed to run.
# index
codescan index --root <path>
# update (full reindex)
codescan update --root <path>
# search
codescan search "hash functions" --root <path> --min-score 0.2
# default verb is search
codescan "hash functions" --root <path>
# show doc comments in human output
codescan search "hash functions" --root <path> --show-comments
# comment-only search (doc comments only)
codescan search "hash functions" --root <path> --comments
# include markdown/README when using default search scope
codescan search "design doc" --include-docs
# only markdown/README results
codescan search "design doc" --docs
# unified scope selector
codescan search "design doc" --scope docs
codescan search "hash functions" --scope comments
# restrict by extension/type/language
codescan search "checksum" --ext md,zig
codescan search "checksum" --type code,doc
codescan search "checksum" --lang zig
# filter by symbol kind (fn, struct, enum, const, var, test, mod, type, macro, ...)
codescan search "config" --kind struct
codescan search "init" --kind fn
codescan search "config" --kind const,var
# meta-kinds: declaration (const+var), definition (any defined symbol)
codescan search "config" --kind declaration
codescan search --kind definition --top 20
# browse mode: list symbols by kind without a text query
codescan search --kind fn --top 10
codescan search --kind struct
# filter by file path (glob) or exact file
codescan search "init" --path "src/storage*"
codescan search "hash" --file src/hash.zig
# regex search (PCRE2) with context lines
codescan search "pub fn \w+Init" --regex --context 5
codescan search "TODO|FIXME|HACK" --regex --top 20
codescan search "defer.*free" --regex --path "src/*.zig"
codescan search "fixme|todo" --regex -i # case-insensitive
codescan search "computeHash" --regex --include-body # show full symbol body containing match
# show uncommitted changes with hashlines (for safe editing from diff output)
codescan diff
codescan diff --staged
# index node_modules too
codescan index --include-node-modules
# show index and watcher status
codescan status
codescan status --json
# focused command help
codescan help search
codescan search --help
# stdin JSON request mode (auto-routed to CLI args, always emits JSON)
printf '{"action":"search","query":"checksum","mode":"lexical","db":".codescan/index.sqlite3"}\n' | codescan --json
If --root is omitted, codescan searches upward from the current directory for a .codescan/
directory and uses that as the root (otherwise it falls back to the current directory).
Search defaults to the primary code language by file count unless a filter is supplied.
Multi-word queries use OR semantics in lexical/hybrid search — results matching any term surface, with BM25 ranking results matching all terms higher.
--include-docs adds markdown/README; --docs/--only-docs restricts results to markdown/README only.
--comments/--only-comments restricts results to doc comments.
--scope <code|docs|comments|all> is a unified alias for common filter combinations.
Index/update defaults to code + docs unless --type/index_type is set.
Built-in ignores: .git/, .codescan/, .codescan-fixtures/, deps/, node_modules/ (opt-in), .zig-cache/, zig-cache/, .zig-out/, zig-out/ (see PROJECT_STATE for full list).
Human output uses ANSI colors by default; set NO_COLOR=1 to disable.
Interactive index/update shows a compact per-file progress counter on stderr (TTY only).
Set DEBUG=1 to emit verbose indexing progress to stderr.
Run (HTTP)
codescan serve --root <path> --http-host 127.0.0.1 --http-port 8123
Endpoints:
| Endpoint | Method | Description |
|----------|--------|-------------|
| /health | GET | Health check |
| /help | GET | List all endpoints |
| /search | POST | Semantic code search (/query is an alias) |
| /index | POST | Index/reindex repository |
| /symbols | POST | List or find symbols (/find-symbol is an alias) |
| /replace-symbol | POST | Replace a symbol's body |
| /insert-after | POST | Insert code after a symbol |
| /insert-before | POST | Insert code before a symbol |
| /replace-lines | POST | Replace hashline-validated line range |
| /insert-at | POST | Insert after hashline-validated line |
| /replace-content | POST | Find/replace text or regex |
| /references | POST | Find references via LSP |
| /rename | POST | Rename symbol via LSP |
| /status | GET | Index and watcher status |
# examples
curl -s localhost:8123/symbols -d '{"file":"src/main.zig"}'
curl -s localhost:8123/symbols -d '{"file":"src/main.zig","pattern":"runSearch","include_body":true}'
curl -s localhost:8123/symbols -d '{"file":["src/main.zig","src/cli.zig"],"pattern":"parse"}'
curl -s localhost:8123/symbols -d '{"pattern":"init"}'
curl -s localhost:8123/replace-content -d '{"file":"src/lib.zig","needle":"old","body":"new","all":true}'
Run (MCP)
codescan includes an MCP server for direct LLM tool integration. It communicates via JSON-RPC 2.0 over stdio (newline-delimited).
codescan mcp-serve --root <path>
Claude Desktop / Claude Code configuration
Add to your MCP settings:
{
"mcpServers": {
"codescan": {
"command": "/path/to/codescan",
"args": ["mcp-serve", "--root", "/path/to/your/project"]
}
}
}
Codex CLI / Codex Desktop configuration
Use an absolute binary path so startup does not depend on PATH:
codex mcp remove codescan
codex mcp add codescan -- /path/to/codescan mcp-serve --root /path/to/your/project
codex mcp get codescan
If you prefer command = "codescan" in ~/.codex/config.toml, ensure the app's
launch environment includes the directory that contains codescan.
MCP troubleshooting
MCP startup failed: No such file or directory (os error 2)usually means the MCP command could not be resolved.- Fix: configure an absolute binary path (recommended), or fix
PATHfor the app launch environment. - Verify with
codex mcp list/codex mcp get codescan.
Available MCP tools
| Tool | Description |
|------|-------------|
| search | Semantic code search (query is an alias). Params: query, kind, path, file, lang, top |
| index | Index/reindex repository |
| symbols | List or find symbols (optional file, pattern, include_body) |
| replace_symbol | Replace a symbol's body |
| insert_after | Insert code after a symbol |
| insert_before | Insert code before a symbol |
| replace_lines | Replace hashline-validated line range |
| insert_at | Insert after hashline-validated line |
| replace_content | Find/replace text or regex |
| references | Find references via LSP |
| rename | Rename symbol via LSP |
| config | Show configuration |
| status | Index and watcher status |
Semantic Editing
codescan provides structural editing commands for AI agents and scripts. All editing commands read replacement text from stdin.
Hashlines
Every codescan command that outputs source lines annotates them with a 3-character base-62 content-chain hash:
44:k7m|fn init(self: *Self) void {
45:r2p| self.count = 0;
46:a9x| self.buffer = undefined;
47:3bw| self.ready = false;
48:npq|}
Each hash incorporates the previous line's hash, forming a chain. If any line above
changes, all subsequent hashes cascade — so a stale line:hash reference is always
detected. This lets AI agents and scripts target exact line ranges without the si
