Thread
Early stage, next-gen, code intelligence platform
Install / Use
/learn @knitli/ThreadREADME
Thread
A safe, fast, flexible code analysis and parsing engine built in Rust. Production-ready service-library dual architecture with content-addressed caching and incremental intelligence.
Thread is a high-performance code analysis platform that operates as both a reusable library ecosystem and a persistent service. Built on tree-sitter parsers and enhanced with the ReCoco dataflow framework, Thread delivers 50x+ performance gains through content-addressed caching while supporting dual deployment: CLI with Rayon parallelism and Edge on Cloudflare Workers.
Key Features
- ✅ Content-Addressed Caching: Blake3 fingerprinting enables 99.7% cost reduction and 346x faster analysis on repeated runs
- ✅ Incremental Updates: Only reanalyze changed files—unmodified code skips processing automatically
- ✅ Dual Deployment: Single codebase compiles to both CLI (Rayon + Postgres) and Edge (tokio + D1 on Cloudflare Workers)
- ✅ Multi-Language Support: 20+ languages via tree-sitter (Rust, TypeScript, Python, Go, Java, C/C++, and more)
- ✅ Pattern Matching: Powerful AST-based pattern matching with meta-variables for complex queries
- ✅ Production Performance: >1,000 files/sec throughput, >90% cache hit rate, <50ms p95 latency
Quick Start
Installation
# Clone the repository
git clone https://github.com/knitli/thread.git
cd thread
# Install development tools (optional, requires mise)
mise run install-tools
# Build Thread with all features
cargo build --workspace --all-features --release
# Verify installation
./target/release/thread --version
Basic Usage as Library
use thread_ast_engine::{Root, Language};
// Parse source code
let source = "function hello() { return 42; }";
let root = Root::new(source, Language::JavaScript)?;
// Find all function declarations
let functions = root.find_all("function $NAME($$$PARAMS) { $$$BODY }");
// Extract function names
for func in functions {
println!("Found function: {}", func.get_text("NAME")?);
}
Using Thread Flow for Analysis Pipelines
use thread_flow::ThreadFlowBuilder;
// Build a declarative analysis pipeline
let flow = ThreadFlowBuilder::new("analyze_rust")
.source_local("src/", &["**/*.rs"], &["target/**"])
.parse()
.extract_symbols()
.target_postgres("code_symbols", &["content_hash"])
.build()
.await?;
// Execute the flow
flow.execute().await?;
Command Line Usage
# Analyze a codebase (first run)
thread analyze ./my-project
# → Analyzing 1,000 files: 10.5s
# Second run (with cache)
thread analyze ./my-project
# → Analyzing 1,000 files: 0.3s (100% cache hits, 35x faster!)
# Incremental update (only changed files)
# Edit 10 files, then:
thread analyze ./my-project
# → Analyzing 10 files: 0.15s (990 files cached)
Architecture
Thread follows a service-library dual architecture with six main crates plus service layer:
Library Core (Reusable Components)
thread-ast-engine- Core AST parsing, pattern matching, and transformation enginethread-language- Language definitions and tree-sitter parser integrations (20+ languages)thread-rule-engine- Rule-based scanning and transformation with YAML configurationthread-utilities- Shared utilities including SIMD optimizations and hash functionsthread-wasm- WebAssembly bindings for browser and edge deployment
Service Layer (Orchestration & Persistence)
thread-flow- High-level dataflow pipelines with ThreadFlowBuilder APIthread-services- Service interfaces, API abstractions, and ReCoco integration- Storage Backends:
- Postgres (CLI deployment) - Persistent caching with <10ms p95 latency
- D1 (Cloudflare Edge) - Distributed caching across CDN nodes with <50ms p95 latency
- Qdrant (optional) - Vector similarity search for semantic analysis
Concurrency Models
- Rayon (CLI) - CPU-bound parallelism for local multi-core utilization (2-8x speedup)
- tokio (Edge) - Async I/O for horizontal scaling and Cloudflare Workers
Deployment Options
CLI Deployment (Local/Server)
Best for: Development environments, CI/CD pipelines, large batch processing
# Build with CLI features (Postgres + Rayon parallelism)
cargo build --release --features "recoco-postgres,parallel,caching"
# Configure PostgreSQL backend
export DATABASE_URL=postgresql://user:pass@localhost/thread_cache
export RAYON_NUM_THREADS=8 # Use 8 cores
# Run analysis
./target/release/thread analyze ./large-codebase
# → Performance: 1,000-10,000 files per run
Features: Direct filesystem access, multi-core parallelism, persistent caching, unlimited CPU time
See CLI Deployment Guide for complete setup.
Edge Deployment (Cloudflare Workers)
Best for: Global API services, low-latency analysis, serverless architecture
# Build WASM for edge
cargo run -p xtask build-wasm --release
# Deploy to Cloudflare Workers
wrangler deploy
# Access globally distributed API
curl https://thread-api.workers.dev/analyze \
-d '{"code":"fn main(){}","language":"rust"}'
# → Response time: <50ms worldwide (p95)
Features: Global CDN distribution, auto-scaling, D1 distributed storage, no infrastructure management
See Edge Deployment Guide for complete setup.
Language Support
Thread supports 20+ programming languages via tree-sitter parsers:
Tier 1 (Primary Focus)
- Rust, JavaScript/TypeScript, Python, Go, Java
Tier 2 (Full Support)
- C/C++, C#, PHP, Ruby, Swift, Kotlin, Scala
Tier 3 (Basic Support)
- Bash, CSS, HTML, JSON, YAML, Lua, Elixir, Haskell
Each language provides full AST parsing, symbol extraction, and pattern matching capabilities.
Pattern Matching System
Thread's core strength is AST-based pattern matching using meta-variables:
Meta-Variable Syntax
$VAR- Captures a single AST node$$$ITEMS- Captures multiple consecutive nodes (ellipsis)$_- Matches any node without capturing
Examples
// Find all variable declarations
root.find_all("let $VAR = $VALUE")
// Find if-else statements
root.find_all("if ($COND) { $$$THEN } else { $$$ELSE }")
// Find function calls with any arguments
root.find_all("$FUNC($$$ARGS)")
// Find class methods
root.find_all("class $CLASS { $$$METHODS }")
YAML Rule System
id: no-var-declarations
message: "Use 'let' or 'const' instead of 'var'"
language: JavaScript
severity: warning
rule:
pattern: "var $NAME = $VALUE"
fix: "let $NAME = $VALUE"
Performance Characteristics
Benchmarks (Phase 5 Real-World Validation)
| Language | Files | Time | Throughput | Cache Hit | Incremental (1% update) | |------------|---------|--------|----------------|-----------|-------------------------| | Rust | 10,100 | 7.4s | 1,365 files/s | 100% | 0.6s (100 files) | | TypeScript | 10,100 | 10.7s | 944 files/s | 100% | ~1.0s (100 files) | | Python | 10,100 | 8.5s | 1,188 files/s | 100% | 0.7s (100 files) | | Go | 10,100 | 5.4s | 1,870 files/s | 100% | 0.4s (100 files) |
Content-Addressed Caching Performance
| Operation | Time | Speedup vs Parse | Notes | |------------------------|---------|------------------|----------------------------| | Blake3 fingerprint | 425ns | 346x faster | Single file | | Batch fingerprint | 17.7µs | - | 100 files | | AST parsing | 147µs | Baseline | Small file (<1KB) | | Cache hit (in-memory) | <1µs | 147,000x faster | LRU cache lookup | | Cache hit (repeated) | 0.9s | 35x faster | 10,000 file reanalysis | | Incremental (1%) | 0.6s | 12x faster | 100 changed, 10K total |
Storage Backend Latency
| Backend | Target | Actual (Phase 5) | Deployment | |------------|-----------|------------------|------------| | InMemory | N/A | <1ms | Testing | | Postgres | <10ms p95 | <1ms (local) | CLI | | D1 | <50ms p95 | <1ms (local) | Edge |
Development
Prerequisites
- Rust: 1.85.0 or later (edition 2024)
- Tools: cargo-nextest (optional), mise (optional)
Building
# Build everything (except WASM)
mise run build
# or: cargo build --workspace
# Build in release mode
mise run build-release
# Build WASM for edge deployment
mise run build-wasm-release
Testing
# Run all tests
mise run test
# or: cargo nextest run --all-features --no-fail-fast -j 1
# Run tests for specific crate
cargo nextest run -p thread-ast-engine --all-features
# Run benchmarks
cargo bench -p thread-rule-engine
Quality Checks
# Full linting
mise run lint
# Auto-fix formatting and linting issues
mise run fix
# Run CI pipeline locally
mise run ci
Single Test Execution
# Run specific test
cargo nextest run --manifest-path Cargo.toml test_name --all-features
# Run benchmarks
cargo bench -p thread-flow
Documentation
User Guides
- CLI Deployment Guide - Local/server deployment with Postgres
- Edge Deployment Guide - Cloudflare Workers with D1
- Architecture Overview - System design and data flow
API Documentation
- Rustdoc: Run
cargo doc --open --no-deps --workspacefor full API documentation - **Exa
