SkillAgentSearch skills...

Toktoken

TokToken is a fast, single-binary C codebase indexer for AI coding agents. Powered by universal-ctags and SQLite FTS5, it provides precise symbol search, dependency tracking, and an MCP server. TokToken reduces LLM context token usage by 88-99% by retrieving exact code symbols instead of reading entire files. Zero runtime dependencies.

Install / Use

/learn @mauriziofonte/Toktoken
About this skill

Quality Score

0/100

Supported Platforms

Claude Code
Cursor

README

TokToken (beta release)

Right now, your AI coding agent reads entire files just to find one function. That wastes tokens, money, and context window. TokToken fixes this.

Real numbers from Redis (727 files, 45K symbols, indexed in 0.9s):

| What your agent needs | Without TokToken | With TokToken | Savings | | --------------------- | ---------------- | ------------- | ------- | | initServer() in server.c (8141 lines) | 84,193 tokens | 2,699 tokens | 97% | | sdslen() in sds.h (340 lines) | 2,678 tokens | 132 tokens | 95% | | processCommand() in server.c | 84,193 tokens | 4,412 tokens | 95% | | redisCommandProc typedef in server.h (4503 lines) | 56,754 tokens | 50 tokens | 99% |

TokToken scales well. Check it on the Linux kernel (65K files, 7.4M symbols) indexes in ~170 seconds:

| What your agent needs | Without TokToken | With TokToken | Savings | | --------------------- | ---------------- | ------------- | ------- | | __schedule() in kernel/sched/core.c (10961 lines) | 73,084 tokens | 1,335 tokens | 98% | | do_sys_open() in fs/open.c (1583 lines) | 9,915 tokens | 89 tokens | 99% | | kmalloc() in include/linux/slab.h (1280 lines) | 11,272 tokens | 1,372 tokens | 88% | | ext4_fill_super() in fs/ext4/super.c (7573 lines) | 53,651 tokens | 376 tokens | 99% |

One command indexes your codebase. Your agent searches symbols, traces imports, and retrieves only the code it needs -- 88-99% fewer tokens on every operation.

toktoken index:create            # index your project (once)
toktoken search:symbols "auth"   # find symbols by name
toktoken inspect:symbol <id>     # retrieve just the code that matters

Works with Claude Code, Cursor, Windsurf, Copilot, Gemini, Codex, and any MCP-compatible agent. Single binary, zero dependencies, nothing written inside your project.

Quick setup: tell your AI agent to read docs/LLM.md -- it will install and configure TokToken (almost) autonomously. For the rest of you, humans, keep reading.

Features

  • 49 languages via universal-ctags + 16 custom parsers (see docs/LANGUAGES.md)
  • Import graph: cross-file dependency tracking with find:importers, find:references, find:callers, and inspect:dependencies
  • FTS5 search with relevance scoring, cascading query strategies, and token-budget-aware result slicing
  • Incremental indexing using content hashing -- re-indexes only changed files, including single-file reindex
  • Dead code detection: find symbols with no callers or importers across the codebase
  • Blast radius analysis: trace all files and symbols affected by a change to a given file or symbol
  • Circular import detection: identify import cycles in the dependency graph
  • Multi-symbol bundles: retrieve context bundles for multiple symbols in a single call with markdown output option
  • Scope-filtered search: restrict symbol and text search to a subtree or file set
  • Centrality-based ranking: symbols ranked by import-graph centrality in addition to FTS relevance
  • MCP server (toktoken serve) for native IDE integration with tiered tool loading
  • GitHub repo indexing (toktoken index:github owner/repo) -- index any public repo without cloning
  • Smart filtering: excludes non-code files (CSS, HTML) and vendored directories by default, with selective --include override
  • Security: symlink escape detection, secret pattern filtering, binary exclusion
  • Token savings tracking: cumulative metrics via the stats command
  • Auto-update: --self-update with SHA-256 verification and atomic binary replacement
  • Cross-platform: Linux (x64/ARM64/ARMv7), macOS (Intel/Apple Silicon), Windows (x64)
  • Single static binary: no runtime requirements beyond universal-ctags (see below)
  • No project pollution: all data stored under ~/.cache/toktoken/, nothing written inside your project

Prerequisites

TokToken requires universal-ctags (NOT exuberant-ctags) for symbol extraction. Static linking of ctags is planned but not yet implemented.

| Platform | Install command | | -------- | --------------- | | Ubuntu/Debian | sudo apt-get install -y universal-ctags | | Fedora/RHEL | sudo dnf install universal-ctags | | Arch | sudo pacman -S ctags | | macOS | brew install universal-ctags | | Windows | choco install universal-ctags |

Verify installation: ctags --version must show Universal Ctags, not Exuberant Ctags.

Quick Start

PATH setup: Ensure ~/.local/bin is in your PATH. If not, add export PATH="$HOME/.local/bin:$PATH" to your ~/.bashrc or ~/.zshrc.

# Install (Linux x86_64 example)
mkdir -p ~/.local/bin
curl -fsSL https://github.com/mauriziofonte/toktoken/releases/latest/download/toktoken-linux-x86_64 \
  -o ~/.local/bin/toktoken && chmod +x ~/.local/bin/toktoken

# Index a project
cd /path/to/your/project
toktoken index:create

# Search for symbols (-k filters by kind, -c enables compact output)
# Note: by default, TokToken excludes non-code files (CSS, HTML, SVG) and
# vendored subdirectories. Use -f / --full to include everything.
# Markdown files are always indexed (documentation kinds: chapter, section, subsection).
toktoken search:symbols "auth" -ck class,method,function

# Full-text search grouped by file
toktoken search:text "TODO" -g file

# Inspect a specific symbol
toktoken inspect:symbol "src/Auth.php::Auth.login#method"

# File outline (cheaper than reading the whole file)
toktoken inspect:outline src/Auth.php

# Update index after edits
toktoken index:update

Commands

| Command | Description | | ------- | ----------- | | index:create [path] | Create index for a project | | index:update [path] | Incremental re-index (hash-based) | | index:file <file> | Reindex a single file | | index:github <repo> | Clone and index a GitHub repository | | search:symbols <query> | Search symbols by name (FTS5 + scoring + centrality ranking) | | search:text <query> | Full-text search across files (ripgrep + fallback) | | search:cooccurrence <a>,<b> | Find symbols that co-occur in the same file | | search:similar <id> | Find symbols similar to a given one | | inspect:outline <file> | Show file symbol hierarchy | | inspect:symbol <id> | Retrieve symbol source code | | inspect:file <file> | Show file content (supports --lines START-END) | | inspect:bundle <id>[,id2,...] | Get symbol context bundle (definition + imports + outline); comma-separated IDs for multi-symbol; --format markdown for markdown output | | inspect:tree | Show indexed file tree | | inspect:dependencies <file> | Trace transitive import graph (recursive) | | inspect:hierarchy <file> | Show class/function hierarchy with parent-child relationships | | find:importers <file> | Find files that import a given file | | find:references <id> | Find import references to an identifier. Use --check for boolean reference check | | find:callers <id> | Find symbols that likely call a given function/method | | find:dead | Find symbols with no callers or importers (dead code detection) | | inspect:blast <id> | Symbol blast radius analysis (files transitively affected by a change) | | inspect:cycles | Detect circular import chains in the dependency graph | | help [command] | List all tools or show detailed usage for a specific tool | | suggest | Onboarding discovery: top keywords, kind/language distribution, most-imported files, example queries | | stats | Index statistics + token savings report | | cache:clear | Delete current project index. With --all --force: delete all TokToken data | | codebase:detect [path] | Detect if directory is a codebase | | repos:list | List cloned GitHub repositories | | repos:remove <repo> | Remove a cloned repository | | repos:clear | Remove all cloned repositories | | serve | Start MCP server on STDIO | | --self-update | Update to latest release (SHA-256 verified) |

Options

All options have a long form (--option). Most also have a single-letter short form (-x). Short boolean flags can be combined: -cn is equivalent to --compact --count. Value flags accept attached (-l10) or separate (-l 10) values.

Indexing options

Used with index:create, index:update, and index:github.

| Short | Long | Argument | Description | | ----- | ---- | -------- | ----------- | | -m | --max-files | <n> | Maximum number of files to index. Files beyond this limit are silently skipped. Default: 200000. | | -f | --full | | Disable the smart filter. By default, TokToken excludes non-code files (CSS, HTML, SVG, YAML, XML, TOML, GraphQL) and vendored subdirectories to reduce noise. Markdown files (.md, .markdown, .mdx) are always indexed regardless of the smart filter, producing documentation-specific kinds (chapter, section, subsection). With --full, everything is indexed. | | -I | --include | <dir> | Force-include a normally-skipped directory (e.g. vendor). Repeatable: -I vendor -I node_modules. VCS dirs (.git, .svn, .hg) cannot be included. The override persists across index:update cycles. Unlike --full, this only un-skips the named directory — the smart filter still applies to file extensions inside it. | | -i | --ignore | <pattern> | Add an extra ignore pattern. Files/directories matching this glob are skipped during discovery. Repeatable: -i vendor -i dist -i .cache. | | | --languages | <list> | Comma-separated list of languages to index. Only files detected as one of these languages are processed. Example: --languages c,python,rust. | | -X | --diagnostic | | Enable structured diagnostic output. Emits JSONL events to stderr with per-phase timing, worker progress, memory snapshots, and pipeline metrics. See [

View on GitHub
GitHub Stars49
CategoryDevelopment
Updated22h ago
Forks9

Languages

C

Security Score

90/100

Audited on Apr 6, 2026

No findings