SkillAgentSearch skills...

CodeWiki

Open-source framework for holistic, structured repository-level documentation across multilingual codebases

Install / Use

/learn @FSoft-AI4Code/CodeWiki

README

<h1 align="center">CodeWiki: Evaluating AI's Ability to Generate Holistic Documentation for Large-Scale Codebases</h1> <p align="center"> <strong>AI-Powered Repository Documentation Generation</strong> • <strong>Multi-Language Support</strong> • <strong>Architecture-Aware Analysis</strong> </p> <p align="center"> Generate holistic, structured documentation for large-scale codebases • Cross-module interactions • Visual artifacts and diagrams </p> <p align="center"> <a href="https://python.org/"><img alt="Python version" src="https://img.shields.io/badge/python-3.12+-blue?style=flat-square" /></a> <a href="./LICENSE"><img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-green.svg?style=flat-square" /></a> <a href="https://github.com/FSoft-AI4Code/CodeWiki/stargazers"><img alt="GitHub stars" src="https://img.shields.io/github/stars/FSoft-AI4Code/CodeWiki?style=flat-square" /></a> <a href="https://arxiv.org/abs/2510.24428"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2510.24428-b31b1b?style=flat-square" /></a> </p> <p align="center"> <a href="#quick-start"><strong>Quick Start</strong></a> • <a href="#cli-commands"><strong>CLI Commands</strong></a> • <a href="#documentation-output"><strong>Output Structure</strong></a> • <a href="https://arxiv.org/abs/2510.24428"><strong>Paper</strong></a> </p> <p align="center"> <img src="./img/framework-overview.png" alt="CodeWiki Framework" width="600" style="border: 2px solid #e1e4e8; border-radius: 12px; padding: 20px;"/> </p>

Quick Start

1. Install CodeWiki

# Install from source
pip install git+https://github.com/FSoft-AI4Code/CodeWiki.git

# Verify installation
codewiki --version

2. Configure Your Environment

CodeWiki supports multiple models via an OpenAI-compatible SDK layer.

codewiki config set \
  --api-key YOUR_API_KEY \
  --base-url https://api.anthropic.com \
  --main-model claude-sonnet-4 \
  --cluster-model claude-sonnet-4 \
  --fallback-model glm-4p5

3. Generate Documentation

# Navigate to your project
cd /path/to/your/project

# Generate documentation
codewiki generate

# Generate with HTML viewer for GitHub Pages
codewiki generate --github-pages --create-branch

That's it! Your documentation will be generated in ./docs/ with comprehensive repository-level analysis.

Usage Example

CLI Usage Example


What is CodeWiki?

CodeWiki is an open-source framework for automated repository-level documentation across eight programming languages. It generates holistic, architecture-aware documentation that captures not only individual functions but also their cross-file, cross-module, and system-level interactions.

Key Innovations

| Innovation | Description | Impact | |------------|-------------|--------| | Hierarchical Decomposition | Dynamic programming-inspired strategy that preserves architectural context | Handles codebases of arbitrary size (86K-1.4M LOC tested) | | Recursive Agentic System | Adaptive multi-agent processing with dynamic delegation capabilities | Maintains quality while scaling to repository-level scope | | Multi-Modal Synthesis | Generates textual documentation, architecture diagrams, data flows, and sequence diagrams | Comprehensive understanding from multiple perspectives |

Supported Languages

🐍 Python☕ Java🟨 JavaScript🔷 TypeScript⚙️ C🔧 C++🪟 C#🎯 Kotlin


CLI Commands

Configuration Management

# Set up your API configuration
codewiki config set \
  --api-key <your-api-key> \
  --base-url <provider-url> \
  --main-model <model-name> \
  --cluster-model <model-name> \
  --fallback-model <model-name>

# Configure max token settings
codewiki config set --max-tokens 32768 --max-token-per-module 36369 --max-token-per-leaf-module 16000

# Configure max depth for hierarchical decomposition
codewiki config set --max-depth 3

# Show current configuration
codewiki config show

# Validate your configuration
codewiki config validate

Documentation Generation

# Basic generation
codewiki generate

# Custom output directory
codewiki generate --output ./documentation

# Create git branch for documentation
codewiki generate --create-branch

# Generate HTML viewer for GitHub Pages
codewiki generate --github-pages

# Enable verbose logging
codewiki generate --verbose

# Full-featured generation
codewiki generate --create-branch --github-pages --verbose

Customization Options

CodeWiki supports customization for language-specific projects and documentation styles:

# C# project: only analyze .cs files, exclude test directories
codewiki generate --include "*.cs" --exclude "Tests,Specs,*.test.cs"

# Focus on specific modules with architecture-style docs
codewiki generate --focus "src/core,src/api" --doc-type architecture

# Add custom instructions for the AI agent
codewiki generate --instructions "Focus on public APIs and include usage examples"

Pattern Behavior (Important!)

  • --include: When specified, ONLY these patterns are used (replaces defaults completely)

    • Example: --include "*.cs" will analyze ONLY .cs files
    • If omitted, all supported file types are analyzed
    • Supports glob patterns: *.py, src/**/*.ts, *.{js,jsx}
  • --exclude: When specified, patterns are MERGED with default ignore patterns

    • Example: --exclude "Tests,Specs" will exclude these directories AND still exclude .git, __pycache__, node_modules, etc.
    • Default patterns include: .git, node_modules, __pycache__, *.pyc, bin/, dist/, and many more
    • Supports multiple formats:
      • Exact names: Tests, .env, config.local
      • Glob patterns: *.test.js, *_test.py, *.min.*
      • Directory patterns: build/, dist/, coverage/

Setting Persistent Defaults

Save your preferred settings as defaults:

# Set include patterns for C# projects
codewiki config agent --include "*.cs"

# Exclude test projects by default (merged with default excludes)
codewiki config agent --exclude "Tests,Specs,*.test.cs"

# Set focus modules
codewiki config agent --focus "src/core,src/api"

# Set default documentation type
codewiki config agent --doc-type architecture

# View current agent settings
codewiki config agent

# Clear all agent settings
codewiki config agent --clear

| Option | Description | Behavior | Example | |--------|-------------|----------|---------| | --include | File patterns to include | Replaces defaults | *.cs, *.py, src/**/*.ts | | --exclude | Patterns to exclude | Merges with defaults | Tests,Specs, *.test.js, build/ | | --focus | Modules to document in detail | Standalone option | src/core,src/api | | --doc-type | Documentation style | Standalone option | api, architecture, user-guide, developer | | --instructions | Custom agent instructions | Standalone option | Free-form text |

Token Settings

CodeWiki allows you to configure maximum token limits for LLM calls. This is useful for:

  • Adapting to different model context windows
  • Controlling costs by limiting response sizes
  • Optimizing for faster response times
# Set max tokens for LLM responses (default: 32768)
codewiki config set --max-tokens 16384

# Set max tokens for module clustering (default: 36369)
codewiki config set --max-token-per-module 40000

# Set max tokens for leaf modules (default: 16000)
codewiki config set --max-token-per-leaf-module 20000

# Set max depth for hierarchical decomposition (default: 2)
codewiki config set --max-depth 3

# Override at runtime for a single generation
codewiki generate --max-tokens 16384 --max-token-per-module 40000 --max-depth 3

| Option | Description | Default | |--------|-------------|---------| | --max-tokens | Maximum output tokens for LLM response | 32768 | | --max-token-per-module | Input tokens threshold for module clustering | 36369 | | --max-token-per-leaf-module | Input tokens threshold for leaf modules | 16000 | | --max-depth | Maximum depth for hierarchical decomposition | 2 |

Configuration Storage

  • API keys: Securely stored in system keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service)
  • Settings & Agent Instructions: ~/.codewiki/config.json

Documentation Output

Generated documentation includes both textual descriptions and visual artifacts for comprehensive understanding.

Textual Documentation

  • Repository overview with architecture guide
  • Module-level documentation with API references
  • Usage examples and implementation patterns
  • Cross-module interaction analysis

Visual Artifacts

  • System architecture diagrams (Mermaid)
  • Data flow visualizations
  • Dependency graphs and module relationships
  • Sequence diagrams for complex interactions

Output Structure

./docs/
├── overview.md              # Repository overview (start here!)
├── module1.md               # Module documentation
├── module2.md               # Additional modules...
├── module_tree.json         # Hierarchical module structure
├── first_module_tree.json   # Initial clustering result
├── metadata.json            # Generation metadata
└── index.html               # Interactive viewer (with --github-pages)

Experimental Results

CodeWiki has been evaluated on CodeWikiBench, the first benchmark specifically designed for repository-level documentation quality assessment.

Performance by Language Category

| Language Category | CodeWiki (Sonnet-4) | DeepWiki | Improvement | |-------------------|---------------------|----------|-------------| | High-Level (Python, JS, TS) | 79.14% | 68.67% | +10.47% | | Managed (C#, Java) | 68.84% | 64.80% | +4.04% | | Systems (C, C++) | 53.24% | 56.39% | -3.15% | |

View on GitHub
GitHub Stars831
CategoryDevelopment
Updated14h ago
Forks123

Languages

Python

Security Score

85/100

Audited on Apr 1, 2026

No findings