Lilbee
Chat with your documents offline using your own hardware. Augment any AI agent via MCP with hybrid RAG search over PDFs, code, and 150+ formats. Integrate with your favorite GUI via REST API.
Install / Use
/learn @tobocop2/LilbeeQuality Score
Category
Development & EngineeringSupported Platforms
README
lilbee
<p align="center"> <a href="https://pypi.org/project/lilbee/"><img src="https://img.shields.io/pypi/v/lilbee" alt="PyPI"></a> <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11%2B-blue.svg" alt="Python 3.11+"></a> <a href="https://github.com/tobocop2/lilbee/actions/workflows/ci.yml"><img src="https://github.com/tobocop2/lilbee/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="https://tobocop2.github.io/lilbee/coverage/"><img src="https://img.shields.io/badge/coverage-100%25-brightgreen.svg" alt="Coverage"></a> <a href="https://mypy-lang.org/"><img src="https://img.shields.io/badge/typed-mypy-blue.svg" alt="Typed"></a> <a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff"></a> <img src="https://img.shields.io/badge/platform-macOS%20%7C%20Linux%20%7C%20Windows-lightgrey.svg" alt="Platforms"> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License: MIT"></a> <a href="https://pypi.org/project/lilbee/"><img src="https://img.shields.io/pypi/dm/lilbee" alt="Downloads"></a> </p>Beta — feedback and bug reports welcome. Open an issue.
Interactively or programmatically chat with a database of documents using strictly your own hardware, completely offline. Augment any AI agent via MCP or shell — take a free model or even a frontier model and make it better. Talks to an incredible amount of data formats (see supported formats). Integrate document search into your favorite GUI using the built-in REST API — no need for a separate web app when you already have a preferred GUI (see Obsidian plugin).
- Why lilbee
- Demos
- Install
- Quick start · Full usage guide
- Agent integration
- HTTP Server · API reference
- Interactive chat
- Supported formats
Why lilbee
- Your hardware, your data — chat with your documents completely offline. No cloud, no telemetry, no API keys required
- Make any model better — augment any AI agent via MCP or shell with hybrid RAG search. Take a free model or even a frontier model and make it leagues better at your data
- Talks to everything — PDFs, Office docs, spreadsheets, images (OCR), ebooks, and 150+ code languages via tree-sitter
- Bring your own GUI — built-in REST API means you can integrate document search into whatever tool you already use. No extra app needed (see Obsidian plugin)
- Per-project databases —
lilbee initcreates a.lilbee/directory (like.git/) so each project gets its own isolated index
Add files (lilbee add), then search or ask questions. Once indexed, search works without Ollama — agents use their own LLM to reason over the retrieved chunks.
Demos
<details> <summary><b>AI agent</b> — lilbee search vs web search (<a href="docs/benchmarks/godot-level-generator.md">detailed analysis</a>)</summary>Click the ▶ arrows below to expand each demo.
[opencode] + [minimax-m2.5-free][opencode], single prompt, no follow-ups. The [Godot 4.4 XML class reference][godot-docs] (917 files) is indexed in lilbee. The baseline uses [Exa AI][exa] code search instead.
⚠️ Caution: minimax-m2.5-free is a cloud model — retrieved chunks are sent to an external API. Use a local model if your documents are private.
| | API hallucinations | Lines | |---|---|---| | With lilbee (code · config) | 0 | 261 | | Without lilbee (code · config) | 4 (~22% error rate) | 213 |
<details> <summary><b>With lilbee</b> — all Godot API calls match the class reference</summary>

If you spot issues with these benchmarks, please open an issue.
</details>Vision OCR
<details> <summary><b>Scanned PDF → searchable knowledge base</b></summary>A scanned 1998 Star Wars: X-Wing Collector's Edition manual indexed with vision OCR ([LightOnOCR-2][lightonocr]), then queried in lilbee's interactive chat (qwen3-coder:30b, fully local). Three questions about dev team credits, energy management, and starfighter speeds — all answered from the OCR'd content.

See benchmarks, test documents, and sample output for model comparisons.
</details> <details> <summary><b>One-shot question from OCR'd content</b></summary>The scanned Star Wars: X-Wing Collector's Edition guide, queried with a single lilbee ask command — no interactive chat needed.

Standalone
<details> <summary><b>Interactive local offline chat</b></summary>[!NOTE] Entirely local on a 2021 M1 Pro with 32 GB RAM.
Model switching via tab completion, then a Q&A grounded in an indexed PDF.


Add a codebase and search with natural language. Tree-sitter provides AST-aware chunking.
</details> <details> <summary><b>JSON output</b></summary>
Structured JSON output for agents and scripts.
</details>Hardware requirements
When used standalone, lilbee runs entirely on your machine — chat with your documents privately, no cloud required.
| Resource | Minimum | Recommended | |----------|---------|-------------| | RAM | 8 GB | 16–32 GB | | GPU / Accelerator | — | Apple Metal (M-series), NVIDIA GPU (6+ GB VRAM) | | Disk | 2 GB (models + data) | 10+ GB if using multiple models | | CPU | Any modern x86_64 / ARM64 | — |
Ollama handles inference and uses Metal on macOS or CUDA on Linux/Windows. Without a GPU, models fall back to CPU — usable for embedding but slow for chat.
Install
Prerequisites
- Python 3.11+
- [Ollama] — the embedding model (
nomic-embed-text) is auto-pulled on first sync. If no chat model is installed, lilbee prompts you to pick and download one. - Optional (for scanned PDF/image OCR): Tesseract (
brew install tesseract/apt install tesseract-ocr) or an Ollama vision model (recommended for better quality — see vision OCR)
First-time download: If you're new to Ollama, expect the first run to take a while — models are large files that need to be downloaded once. For example,
qwen3:8bis ~5 GB and the embedding modelnomic-embed-textis ~274 MB. After the initial download, models are cached locally and load in seconds. You can check what you have installed withollama list.
Install
pip install lilbee # or: uv tool install lilbee
Development (run from source)
git clone https://github.com/tobocop2/lilbee && cd lilbee
uv sync
uv run lilbee
Quick start
See the usage guide.
Agent integration
lilbee can serve as a local retrieval backend for AI coding agents via MCP or JSON CLI. See docs/agent-integration.md for setup and usage.
HTTP Server
lilbee includes a REST API server so you can integrate document search into any GUI or tool:
lilbee serve # start on a random port (written to <data_dir>/server.port)
lilbee serve --port 8080 # or pick a fixed port
Endpoints include /api/search, /api/ask, /api/chat (with streaming SSE variants), /api/sync, /api/add, and /api/models. When the server is running, interactive API docs are available at /schema/redoc. See the API reference for the full OpenAPI schema.
Interactive chat
Running lilbee or lilbee chat enters an interactive REPL with conversation history, streaming responses, and slash commands:
| Command | Description |
|---------|-------------|
| /status | Show indexed documents and config |
| /add [path] | Add a file or directory (tab-completes paths) |
| /model [name] | Switch chat model — no args opens a curated picker; with a name, switches directly or prompts to download if not installed (tab-completes installed models) |
| /vision [name\|off] | Switch vision OCR model — no args opens a curated picker; with a name, prompts to download if not installed; off disables (tab-completes catalog models) |
| /settings | Show all current configuration values |
| /set <key> <value> | Change a setting (e.g. /set temperature 0.7) |
| /version | Show lilbee version |
| /reset | Delete all documents and data (asks for confirmation) |
| /help | Show available commands |
| /quit | Exit chat |
Slash commands and paths tab-complete. A spinner shows while waiting for the first token from the LLM. Background sync progress appears in the toolbar without interrupting the conversation.
Supported formats
Text extraction powered by [Kreuzberg], code chunking by [tree-sitter]. Structured formats (XML, JSON, CSV) get embedding-friendly preprocessing. This list is not exhaustive — Kreuzberg supports ad
Related Skills
node-connect
334.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
82.3kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
82.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
Hook Development
82.3kThis skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.
