codescan

Semantic code search for local repositories.

Zig CLI + HTTP API + MCP server
Ollama embeddings (default: bge-large, override with OLLAMA_MODEL)
sqlite-vec vector storage
Hybrid search (vector + lexical)
Symbol extraction: Zig, C/C++, TypeScript/JavaScript, Rust, Elixir, Bash, Lua, Nix, Nim, Lean, Idris, Haskell, Go, Ruby, Erlang, OCaml, Swift, LLVM IR, Clojure, Assembly
LSP (references, rename): all of the above
Markdown/text/log indexing with semantic chunking

Install

With Nix (recommended)

# Run directly without installing
nix run github:pmarreck/codescan -- search "your query"

# Install to your profile
nix profile install github:pmarreck/codescan

# For faster downloads, add the garnix binary cache to /etc/nix/nix.conf:
#   extra-substituters = https://cache.garnix.io
#   extra-trusted-public-keys = cache.garnix.io:CTFPyKSLcx5RMJKfLo5EEPUObbA78b0YQ2DTCJXqr9g=

Pre-built binaries (no Nix required)

Pre-built binaries for Linux (x86_64, arm64) and macOS (arm64) are available as artifacts from the latest CI build:

Download from GitHub Actions

Click the most recent successful run
Scroll to the Artifacts section at the bottom
Download the archive for your platform
Extract and place codescan somewhere on your PATH

Note: GitHub requires you to be signed in to download workflow artifacts.

Build from source

nix develop -c zig build -Doptimize=ReleaseFast

Test

./test

CLI/HTTP tests

nix develop -c ./tests/cli/test-cli
nix develop -c ./tests/http/test-http

Integration test

# requires Ollama running with bge-large pulled (or set OLLAMA_MODEL)
nix develop -c ./tests/integration/test-integration

CI (local, Linux only)

# requires act (https://github.com/nektos/act)
./scripts/ci-local

Run (CLI)

# show or edit project config
codescan config
codescan config edit

# ReleaseFast builds are self-contained; no `nix develop` prefix needed to run.
# index
codescan index --root <path>

# update (full reindex)
codescan update --root <path>

# search
codescan search "hash functions" --root <path> --min-score 0.2
# default verb is search
codescan "hash functions" --root <path>
# show doc comments in human output
codescan search "hash functions" --root <path> --show-comments
# comment-only search (doc comments only)
codescan search "hash functions" --root <path> --comments
# include markdown/README when using default search scope
codescan search "design doc" --include-docs
# only markdown/README results
codescan search "design doc" --docs
# unified scope selector
codescan search "design doc" --scope docs
codescan search "hash functions" --scope comments
# restrict by extension/type/language
codescan search "checksum" --ext md,zig
codescan search "checksum" --type code,doc
codescan search "checksum" --lang zig

# filter by symbol kind (fn, struct, enum, const, var, test, mod, type, macro, ...)
codescan search "config" --kind struct
codescan search "init" --kind fn
codescan search "config" --kind const,var
# meta-kinds: declaration (const+var), definition (any defined symbol)
codescan search "config" --kind declaration
codescan search --kind definition --top 20

# browse mode: list symbols by kind without a text query
codescan search --kind fn --top 10
codescan search --kind struct

# filter by file path (glob) or exact file
codescan search "init" --path "src/storage*"
codescan search "hash" --file src/hash.zig

# regex search (PCRE2) with context lines
codescan search "pub fn \w+Init" --regex --context 5
codescan search "TODO|FIXME|HACK" --regex --top 20
codescan search "defer.*free" --regex --path "src/*.zig"
codescan search "fixme|todo" --regex -i  # case-insensitive
codescan search "computeHash" --regex --include-body  # show full symbol body containing match

# show uncommitted changes with hashlines (for safe editing from diff output)
codescan diff
codescan diff --staged

# index node_modules too
codescan index --include-node-modules

# show index and watcher status
codescan status
codescan status --json

# focused command help
codescan help search
codescan search --help

# stdin JSON request mode (auto-routed to CLI args, always emits JSON)
printf '{"action":"search","query":"checksum","mode":"lexical","db":".codescan/index.sqlite3"}\n' | codescan --json

If --root is omitted, codescan searches upward from the current directory for a .codescan/ directory and uses that as the root (otherwise it falls back to the current directory).

Search defaults to the primary code language by file count unless a filter is supplied. Multi-word queries use OR semantics in lexical/hybrid search — results matching any term surface, with BM25 ranking results matching all terms higher. --include-docs adds markdown/README; --docs/--only-docs restricts results to markdown/README only. --comments/--only-comments restricts results to doc comments. --scope <code|docs|comments|all> is a unified alias for common filter combinations. Index/update defaults to code + docs unless --type/index_type is set. Built-in ignores: .git/, .codescan/, .codescan-fixtures/, deps/, node_modules/ (opt-in), .zig-cache/, zig-cache/, .zig-out/, zig-out/ (see PROJECT_STATE for full list).

Human output uses ANSI colors by default; set NO_COLOR=1 to disable. Interactive index/update shows a compact per-file progress counter on stderr (TTY only). Set DEBUG=1 to emit verbose indexing progress to stderr.

Run (HTTP)

codescan serve --root <path> --http-host 127.0.0.1 --http-port 8123

Endpoints:

| Endpoint | Method | Description | |----------|--------|-------------| | /health | GET | Health check | | /help | GET | List all endpoints | | /search | POST | Semantic code search (/query is an alias) | | /index | POST | Index/reindex repository | | /symbols | POST | List or find symbols (/find-symbol is an alias) | | /replace-symbol | POST | Replace a symbol's body | | /insert-after | POST | Insert code after a symbol | | /insert-before | POST | Insert code before a symbol | | /replace-lines | POST | Replace hashline-validated line range | | /insert-at | POST | Insert after hashline-validated line | | /replace-content | POST | Find/replace text or regex | | /references | POST | Find references via LSP | | /rename | POST | Rename symbol via LSP | | /status | GET | Index and watcher status |

# examples
curl -s localhost:8123/symbols -d '{"file":"src/main.zig"}'
curl -s localhost:8123/symbols -d '{"file":"src/main.zig","pattern":"runSearch","include_body":true}'
curl -s localhost:8123/symbols -d '{"file":["src/main.zig","src/cli.zig"],"pattern":"parse"}'
curl -s localhost:8123/symbols -d '{"pattern":"init"}'
curl -s localhost:8123/replace-content -d '{"file":"src/lib.zig","needle":"old","body":"new","all":true}'

Run (MCP)

codescan includes an MCP server for direct LLM tool integration. It communicates via JSON-RPC 2.0 over stdio (newline-delimited).

codescan mcp-serve --root <path>

Claude Desktop / Claude Code configuration

Add to your MCP settings:

{
  "mcpServers": {
    "codescan": {
      "command": "/path/to/codescan",
      "args": ["mcp-serve", "--root", "/path/to/your/project"]
    }
  }
}

Codex CLI / Codex Desktop configuration

Use an absolute binary path so startup does not depend on PATH:

codex mcp remove codescan
codex mcp add codescan -- /path/to/codescan mcp-serve --root /path/to/your/project
codex mcp get codescan

If you prefer command = "codescan" in ~/.codex/config.toml, ensure the app's launch environment includes the directory that contains codescan.

MCP troubleshooting

MCP startup failed: No such file or directory (os error 2) usually means the MCP command could not be resolved.
Fix: configure an absolute binary path (recommended), or fix PATH for the app launch environment.
Verify with codex mcp list / codex mcp get codescan.

Available MCP tools

| Tool | Description | |------|-------------| | search | Semantic code search (query is an alias). Params: query, kind, path, file, lang, top | | index | Index/reindex repository | | symbols | List or find symbols (optional file, pattern, include_body) | | replace_symbol | Replace a symbol's body | | insert_after | Insert code after a symbol | | insert_before | Insert code before a symbol | | replace_lines | Replace hashline-validated line range | | insert_at | Insert after hashline-validated line | | replace_content | Find/replace text or regex | | references | Find references via LSP | | rename | Rename symbol via LSP | | config | Show configuration | | status | Index and watcher status |

Semantic Editing

codescan provides structural editing commands for AI agents and scripts. All editing commands read replacement text from stdin.

Hashlines

Every codescan command that outputs source lines annotates them with a 3-character base-62 content-chain hash:

44:k7m|fn init(self: *Self) void {
45:r2p|    self.count = 0;
46:a9x|    self.buffer = undefined;
47:3bw|    self.ready = false;
48:npq|}

Each hash incorporates the previous line's hash, forming a chain. If any line above changes, all subsequent hashes cascade — so a stale line:hash reference is always detected. This lets AI agents and scripts target exact line ranges without the si

Codescan

Install / Use

README