CodexA — Developer Intelligence Engine Semantic code search · AI-assisted understanding · Agent tooling protocol <a href="https://github.com/M9nx/CodexA/actions"><img src="https://github.com/M9nx/CodexA/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="https://pypi.org/project/codexa/"><img src="https://img.shields.io/pypi/v/codexa?color=blue&label=PyPI" alt="PyPI"></a> <a href="https://pepy.tech/project/codexa"><img src="https://img.shields.io/pepy/dt/codexa?color=blue&label=Downloads" alt="Downloads"></a> <img src="https://img.shields.io/badge/python-3.11%2B-blue" alt="Python 3.11+"> <img src="https://img.shields.io/badge/version-0.5.0-green" alt="Version"> <img src="https://img.shields.io/badge/tests-2657-brightgreen" alt="Tests"> <img src="https://img.shields.io/badge/coverage-79%25-brightgreen" alt="Coverage"> <img src="https://img.shields.io/badge/mypy-strict-blue" alt="mypy strict"> <img src="https://img.shields.io/badge/license-MIT-blue" alt="License">

CodexA is a lightweight developer intelligence engine designed to cooperate with AI coding assistants (GitHub Copilot, Cursor, Cline, etc.) and developer tooling. It indexes codebases locally, performs semantic search, and exposes a structured tool protocol that any AI agent can call over HTTP or CLI.

Features

| Area | What you get | |------|-------------| | Code Indexing | Scan repos, extract functions/classes, generate vector embeddings (sentence-transformers + FAISS), ONNX runtime option, parallel indexing, --watch live re-indexing, .codexaignore support, --add/--inspect per-file control, model-consistency guard, Ctrl+C partial-save | | Rust Search Engine | Native codexa-core Rust crate via PyO3 — HNSW approximate nearest-neighbour search, BM25 keyword index, tree-sitter AST chunker (10 languages), memory-mapped vector persistence, parallel file scanner, optional ONNX embedding inference, optional Tantivy full-text search | | Multi-Mode Search | Semantic, keyword (BM25), regex, hybrid (RRF), and raw filesystem grep (ripgrep backend) with full -A/-B/-C/-w/-v/-c/-l/-L/--exclude/--no-ignore flags, --hybrid/--sem shorthands, --scores, --snippet-length, --no-snippet, JSONL streaming | | RAG Pipeline | 4-stage Retrieval-Augmented Generation — Retrieve → Deduplicate → Re-rank → Assemble with token budget, cross-encoder re-ranking, source citations | | Code Context | Rich context windows — imports, dependencies, AST-based call graphs, surrounding code | | Repository Analysis | Language breakdown (codexa languages), module summaries, component detection | | AI Agent Protocol | 13 built-in tools exposed via HTTP bridge, MCP server (13 tools with pagination/cursors), MCP-over-SSE (--mcp), codexa --serve shorthand, Claude Desktop auto-config (--claude-config), or CLI for any AI agent to invoke | | Quality & Metrics | Complexity analysis, maintainability scoring, quality gates for CI | | Multi-Repo Workspaces | Link multiple repos under one workspace for cross-repo search & refactoring | | Interactive TUI | Terminal REPL with mode switching for interactive exploration | | Streaming Responses | Token-by-token streaming for chat and investigation commands | | Plugin System | 22 hooks for extending every layer — from indexing to tool invocation | | VS Code Extension | 4-panel sidebar (Search, Symbols, Quality, Tools), 8 commands, CodeLens, context menus, status bar | | Editor Plugins | Zed, JetBrains (IntelliJ/PyCharm), Neovim (telescope.nvim), Vim, Sublime Text, Emacs, Helix, Eclipse -- all sharing the same MCP/bridge protocol | | Cross-Language Intelligence | FFI pattern detection, polyglot dependency graphs, language-aware search boosting, universal multi-language call graph | | Multi-Agent Sessions | Concurrent AI agent sessions with shared discovery, semantic diff (rename/move/signature/body detection), RAG code generation |

Quick Start

1. Install

pip install codexa

For semantic indexing and vector search, install the ML extras:

pip install "codexa[ml]"

Or install from source:

git clone https://github.com/M9nx/CodexA.git
cd CodexA
pip install -e ".[dev]"

Alternative installation methods:

# Docker
docker build -t codexa .
docker run --rm -v /path/to/project:/workspace codexa search "auth"

# Homebrew (macOS)
brew install --formula Formula/codexa.rb

2. Initialize a Project

Navigate to any project you want to analyze and run:

cd /path/to/your-project
codexa init

CodexA auto-detects your available RAM and picks the best embedding model. Or choose a model profile explicitly:

codexa init --profile fast       # mxbai-embed-xsmall — low RAM (<1 GB)
codexa init --profile balanced   # MiniLM — good balance (~2 GB)
codexa init --profile precise    # jina-code — best quality (~4 GB)

This creates a .codexa/ directory with configuration, index storage, and session data.

3. Index the Codebase

codexa index .

This parses all source files (Python, JS/TS, Java, Go, Rust, C#, Ruby, C++), extracts symbols, generates embeddings, and stores them in a local FAISS index. Semantic indexing requires codexa[ml].

If you need to keep secrets, generated files, or local config files out of the index, add patterns to .codexaignore at the project root or configure index.exclude_files in .codexa/config.json.

Typical .codexaignore example:

.env*
secrets/*.json
config/local-*.yml
vendor/*

The default embedding model is small, but the PyTorch backend still needs about 2 GB of available RAM. On lower-memory machines, prefer the ONNX backend.

4. Semantic Search

codexa search "jwt authentication"
codexa search "database connection pool" --json
codexa search "error handling" -k 5

5. Explore More

codexa explain MyClass              # Structural explanation of a symbol
codexa context parse_config         # Rich AI context window
codexa deps src/auth.py             # Import / dependency map
codexa summary                      # Full repo summary
codexa quality src/                 # Code quality analysis
codexa hotspots                     # High-risk code hotspots
codexa trace handle_request         # Execution trace of a symbol
codexa evolve                       # Self-improving development loop
codexa grep "TODO|FIXME"            # Raw filesystem grep (ripgrep or Python)
codexa benchmark                    # Performance benchmarking

Using CodexA with AI Agents (GitHub Copilot, etc.)

CodexA is designed to be called by AI coding assistants as an external tool. There are three integration modes: CLI tool mode, HTTP bridge server, and in-process Python API.

Option A — CLI Tool Mode (Recommended for Copilot Chat)

Any AI agent that can run shell commands can use CodexA directly:

# List available tools
codexa tool list --json

# Run a tool with arguments
codexa tool run semantic_search --arg query="authentication middleware" --json
codexa tool run explain_symbol --arg symbol_name="UserService" --json
codexa tool run get_call_graph --arg symbol_name="process_payment" --json
codexa tool run get_dependencies --arg file_path="src/auth.py" --json

# Get tool schema (so the agent knows what arguments to pass)
codexa tool schema semantic_search --json

The --json flag ensures machine-readable output. The --pipe flag suppresses colors and spinners for clean piping.

Option B — HTTP Bridge Server (For MCP / Long-Running Agents)

Start the bridge server to expose all tools over HTTP:

codexa serve --port 24842

The server runs on http://127.0.0.1:24842 and exposes:

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /capabilities | Full capability manifest — version, tools, supported requests | | GET | /health | Health check → {"status": "ok"} | | GET | /tools/list | List all available tools with schemas | | POST | /tools/invoke | Execute a tool by name with arguments | | GET | /tools/stream | SSE stream — tool discovery + heartbeat | | POST | /request | Dispatch any AgentRequest (12 request kinds) |

Example — invoke a tool via HTTP:

curl -X POST http://127.0.0.1:24842/tools/invoke \
  -H "Content-Type: application/json" \
  -d '{"tool_name": "semantic_search", "arguments": {"query": "error handling"}}'

Example — list capabilities:

curl http://127.0.0.1:24842/capabilities

Option C — Python API (In-Process)

from pathlib import Path
from semantic_code_intelligence.tools.executor import ToolExecutor
from semantic_code_intelligence.tools.protocol import ToolInvocation

executor = ToolExecutor(Path("/path/to/project"))
invocation = ToolInvocation(tool_name="semantic_search", arguments={"query": "auth"})
result = executor.execute(invocation)

print(result.success)           # True
print(result.result_payload)    # dict with search results
print(result.execution_time_ms) # timing in milliseconds

Setting Up with VS Code + GitHub Copilot

Step 1 — Install CodexA globally

# Clone the repo
git clone https://github.com/M9nx/CodexA.git

# Install it (makes `codexa` available system-wide in your venv)
cd CodexA
pip install -e ".[dev]"

# Verify
codexa --version    # → codexa, version 0.5.0

Step 2 — Initialize your target project

cd /path/to/your-project
codexa init --index  # Creates .codexa/ and indexes immediately
# Or separately:
codexa init          # Creates .codexa/ directory
codexa index .       # Index the entire codebase
codexa doctor        # Verify everything is healthy
codexa search "main" # Quick sanity check

Step 3 — Add Copilot Custom Instructions (System Prompt)

Create the file `.github/copilot-instru

CodexA

Install / Use

README