TreeDex
Tree-based, vectorless document RAG framework. Connect any LLM via URL/API key.
Install / Use
/learn @mithun50/TreeDexREADME
TreeDex
Tree-based, vectorless document RAG framework.
Index any document into a navigable tree structure, then retrieve relevant sections using any LLM. No vector databases, no embeddings — just structured tree retrieval.
Available for both Python and Node.js — same API, same index format, fully cross-compatible.
How It Works
<p align="center"> <img src="assets/how-treedex-works.svg" alt="How TreeDex Works" width="800"/> </p>- Load — Extract pages from any supported format
- Detect — Auto-extract PDF table of contents or detect headings via font-size analysis (
[H1]/[H2]/[H3]markers) - Index — If a PDF ToC is found, build the tree directly (no LLM needed). Otherwise, LLM analyzes page groups with heading hints to extract hierarchical structure
- Build — Flat sections become a tree with page ranges and embedded text. Orphaned subsections are auto-repaired
- Query — LLM selects relevant tree nodes for your question
- Return — Get context text, source pages, and reasoning
Smart Hierarchy Detection
TreeDex uses multiple strategies to accurately extract document structure, especially for large (300+ page) documents:
- PDF ToC extraction — If the PDF has bookmarks/outline, the tree is built directly from it — zero LLM calls needed
- Font-size heading detection — Analyzes font sizes across the document and injects
[H1]/[H2]/[H3]markers so the LLM knows exactly which level each heading belongs to - Capped continuation context — For multi-chunk documents, the LLM sees a summary of top-level sections + recent sections instead of the full history, preventing prompt bloat
- Orphan repair — If the LLM outputs
"2.3.1"without a"2.3"parent, synthetic parents are auto-inserted to maintain a valid tree
Why TreeDex instead of Vector DB?
<p align="center"> <img src="assets/treedex-vs-vectordb.svg" alt="TreeDex vs Vector DB" width="800"/> </p>Supported LLM Providers
<p align="center"> <img src="assets/llm-providers.svg" alt="LLM Providers" width="800"/> </p>TreeDex works with every major AI provider out of the box. Pick what works for you:
One-liner backends (zero config)
| Backend | Provider | Default Model | Python Deps | Node.js Deps |
|---------|----------|---------------|-------------|-------------|
| GeminiLLM | Google | gemini-2.0-flash | google-generativeai | @google/generative-ai |
| OpenAILLM | OpenAI | gpt-4o | openai | openai |
| ClaudeLLM | Anthropic | claude-sonnet-4-20250514 | anthropic | @anthropic-ai/sdk |
| MistralLLM | Mistral AI | mistral-large-latest | mistralai | @mistralai/mistralai |
| CohereLLM | Cohere | command-r-plus | cohere | cohere-ai |
| GroqLLM | Groq | llama-3.3-70b-versatile | groq | groq-sdk |
| TogetherLLM | Together AI | Llama-3-70b-chat-hf | None | None (fetch) |
| FireworksLLM | Fireworks | llama-v3p1-70b-instruct | None | None (fetch) |
| OpenRouterLLM | OpenRouter | claude-sonnet-4 | None | None (fetch) |
| DeepSeekLLM | DeepSeek | deepseek-chat | None | None (fetch) |
| CerebrasLLM | Cerebras | llama-3.3-70b | None | None (fetch) |
| SambanovaLLM | SambaNova | Llama-3.1-70B-Instruct | None | None (fetch) |
| HuggingFaceLLM | HuggingFace | Mistral-7B-Instruct | None | None (fetch) |
| OllamaLLM | Ollama (local) | llama3 | None | None (fetch) |
Universal backends
| Backend | Use case | Dependencies |
|---------|----------|-------------|
| OpenAICompatibleLLM | Any OpenAI-compatible endpoint (URL + key) | None |
| LiteLLM | 100+ providers via litellm library (Python only) | litellm |
| FunctionLLM | Wrap any function | None |
| BaseLLM | Subclass to build your own | None |
Quick Start
Install
<table> <tr><th>Python</th><th>Node.js</th></tr> <tr><td>pip install treedex
# With optional LLM SDK
pip install treedex[gemini]
pip install treedex[openai]
pip install treedex[claude]
pip install treedex[all]
</td><td>
npm install treedex
# With optional LLM SDK
npm install treedex openai
npm install treedex @google/generative-ai
npm install treedex @anthropic-ai/sdk
</td></tr>
</table>
Pick your LLM and go
<table> <tr><th>Python</th><th>Node.js / TypeScript</th></tr> <tr><td>from treedex import TreeDex, GeminiLLM
llm = GeminiLLM(api_key="YOUR_KEY")
index = TreeDex.from_file("doc.pdf", llm=llm)
result = index.query("What is the main argument?")
print(result.context)
print(result.pages_str) # "pages 5-8, 12-15"
</td><td>
import { TreeDex, GeminiLLM } from "treedex";
const llm = new GeminiLLM("YOUR_KEY");
const index = await TreeDex.fromFile("doc.pdf", llm);
const result = await index.query("What is the main argument?");
console.log(result.context);
console.log(result.pagesStr); // "pages 5-8, 12-15"
</td></tr>
</table>
All providers work the same way
<table> <tr><th>Python</th><th>Node.js / TypeScript</th></tr> <tr><td>from treedex import *
# Google Gemini
llm = GeminiLLM(api_key="YOUR_KEY")
# OpenAI
llm = OpenAILLM(api_key="sk-...")
# Claude
llm = ClaudeLLM(api_key="sk-ant-...")
# Groq (fast inference)
llm = GroqLLM(api_key="gsk_...")
# Together AI
llm = TogetherLLM(api_key="...")
# DeepSeek
llm = DeepSeekLLM(api_key="...")
# OpenRouter (access any model)
llm = OpenRouterLLM(api_key="...")
# Local Ollama
llm = OllamaLLM(model="llama3")
# Any OpenAI-compatible endpoint
llm = OpenAICompatibleLLM(
base_url="https://your-api.com/v1",
api_key="...",
model="model-name",
)
</td><td>
import { /* any backend */ } from "treedex";
// Google Gemini
const llm = new GeminiLLM("YOUR_KEY");
// OpenAI
const llm = new OpenAILLM("sk-...");
// Claude
const llm = new ClaudeLLM("sk-ant-...");
// Groq (fast inference)
const llm = new GroqLLM("gsk_...");
// Together AI
const llm = new TogetherLLM("...");
// DeepSeek
const llm = new DeepSeekLLM("...");
// OpenRouter (access any model)
const llm = new OpenRouterLLM("...");
// Local Ollama
const llm = new OllamaLLM("llama3");
// Any OpenAI-compatible endpoint
const llm = new OpenAICompatibleLLM({
baseUrl: "https://your-api.com/v1",
apiKey: "...",
model: "model-name",
});
</td></tr>
</table>
Wrap any function
<table> <tr><th>Python</th><th>Node.js / TypeScript</th></tr> <tr><td>from treedex import FunctionLLM
llm = FunctionLLM(lambda p: my_api(p))
</td><td>
import { FunctionLLM } from "treedex";
const llm = new FunctionLLM((p) => myApi(p));
</td></tr>
</table>
Build your own backend
<table> <tr><th>Python</th><th>Node.js / TypeScript</th></tr> <tr><td>from treedex import BaseLLM
class MyLLM(BaseLLM):
def generate(self, prompt: str) -> str:
return my_api_call(prompt)
</td><td>
import { BaseLLM } from "treedex";
class MyLLM extends BaseLLM {
async generate(prompt: string): Promise<string> {
return await myApiCall(prompt);
}
}
</td></tr>
</table>
Agentic RAG — get direct answers
Standard mode returns raw context. Agentic mode goes one step further — it retrieves the relevant sections, then generates a direct answer.
<table> <tr><th>Python</th><th>Node.js / TypeScript</th></tr> <tr><td># Standard: returns context + page ranges
result = index.query("What is X?")
print(result.context)
# Agentic: returns a direct answer
result = index.query("What is X?", agentic=True)
print(result.answer) # LLM-generated answer
print(result.pages_str) # source pages
</td><td>
// Standard: returns context + page ranges
const result = await index.query("What is X?");
console.log(result.context);
// Agentic: returns a direct answer
const result = await index.query("What is X?", { agentic: true });
console.log(result.answer); // LLM-generated answer
console.log(result.pagesStr); // source pages
</td></tr>
</table>
Swap LLM at query time
# Build index with one LLM
index = TreeDex.from_file("doc.pdf", llm=gemini_llm)
# Query with a different one — same index, different brain
result = index.query("...", llm=groq_llm)
Save and load indexes
Indexes are saved as JSON. An index created in Python loads in Node.js and vice versa.
<table> <tr><th>Python</th><th>Node.js / TypeScript</th></tr> <tr><td># Save
index.save("my_index.json")
# Load
index = TreeDex.load("my_index.json", llm=llm)
</td><td>
// Save
await index.save("my_index.json");
// Load
const index = await TreeDex.load("my_index.json", llm);
</td></tr>
</table>
Supported Document Formats
| Format | Loader | Python Deps | Node.js Deps |
|--------|--------|-------------|-------------|
| PDF | PDFLoader | pymupdf | pdfjs-dist (included) |
| TXT / MD | TextLoader | None | None |
| HTML | HTMLLoader | None (stdlib) | htmlparser2 (optional, has fallback) |
| DOCX | DOCXLoader | python-docx | mammoth (optional) |
Use auto_loader(path) / autoLoader(path) for automatic format detection.
API Reference
TreeDex
| Method | Python | Node.js |
|--------|--------|---------|
| Build from file | TreeDex.from_file(path, llm) | `await TreeDex.fromFile(path, ll
