SkillAgentSearch skills...

Lilbee

Chat with your documents offline using your own hardware. Augment any AI agent via MCP with hybrid RAG search over PDFs, code, and 150+ formats. Integrate with your favorite GUI via REST API.

Install / Use

/learn @tobocop2/Lilbee

README

lilbee

Beta — feedback and bug reports welcome. Open an issue.

<p align="center"> <a href="https://pypi.org/project/lilbee/"><img src="https://img.shields.io/pypi/v/lilbee" alt="PyPI"></a> <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11%2B-blue.svg" alt="Python 3.11+"></a> <a href="https://github.com/tobocop2/lilbee/actions/workflows/ci.yml"><img src="https://github.com/tobocop2/lilbee/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="https://tobocop2.github.io/lilbee/coverage/"><img src="https://img.shields.io/badge/coverage-100%25-brightgreen.svg" alt="Coverage"></a> <a href="https://mypy-lang.org/"><img src="https://img.shields.io/badge/typed-mypy-blue.svg" alt="Typed"></a> <a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff"></a> <img src="https://img.shields.io/badge/platform-macOS%20%7C%20Linux%20%7C%20Windows-lightgrey.svg" alt="Platforms"> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License: MIT"></a> <a href="https://pypi.org/project/lilbee/"><img src="https://img.shields.io/pypi/dm/lilbee" alt="Downloads"></a> </p>

Interactively or programmatically chat with a database of documents using strictly your own hardware, completely offline. Augment any AI agent via MCP or shell — take a free model or even a frontier model and make it better. Talks to an incredible amount of data formats (see supported formats). Integrate document search into your favorite GUI using the built-in REST API — no need for a separate web app when you already have a preferred GUI (see Obsidian plugin).



Why lilbee

  • Your hardware, your data — chat with your documents completely offline. No cloud, no telemetry, no API keys required
  • Make any model better — augment any AI agent via MCP or shell with hybrid RAG search. Take a free model or even a frontier model and make it leagues better at your data
  • Talks to everything — PDFs, Office docs, spreadsheets, images (OCR), ebooks, and 150+ code languages via tree-sitter
  • Bring your own GUI — built-in REST API means you can integrate document search into whatever tool you already use. No extra app needed (see Obsidian plugin)
  • Per-project databaseslilbee init creates a .lilbee/ directory (like .git/) so each project gets its own isolated index

Add files (lilbee add), then search or ask questions. Once indexed, search works without Ollama — agents use their own LLM to reason over the retrieved chunks.

Demos

Click the ▶ arrows below to expand each demo.

<details> <summary><b>AI agent</b> — lilbee search vs web search (<a href="docs/benchmarks/godot-level-generator.md">detailed analysis</a>)</summary>

[opencode] + [minimax-m2.5-free][opencode], single prompt, no follow-ups. The [Godot 4.4 XML class reference][godot-docs] (917 files) is indexed in lilbee. The baseline uses [Exa AI][exa] code search instead.

⚠️ Caution: minimax-m2.5-free is a cloud model — retrieved chunks are sent to an external API. Use a local model if your documents are private.

| | API hallucinations | Lines | |---|---|---| | With lilbee (code · config) | 0 | 261 | | Without lilbee (code · config) | 4 (~22% error rate) | 213 |

<details> <summary><b>With lilbee</b> — all Godot API calls match the class reference</summary>

With lilbee MCP

</details> <details> <summary><b>Without lilbee</b> — 4 hallucinated APIs (<a href="docs/benchmarks/godot-level-generator.md#without-lilbee-213-lines--4-bugs">details</a>)</summary>

Without lilbee

</details>

If you spot issues with these benchmarks, please open an issue.

</details>

Vision OCR

<details> <summary><b>Scanned PDF → searchable knowledge base</b></summary>

A scanned 1998 Star Wars: X-Wing Collector's Edition manual indexed with vision OCR ([LightOnOCR-2][lightonocr]), then queried in lilbee's interactive chat (qwen3-coder:30b, fully local). Three questions about dev team credits, energy management, and starfighter speeds — all answered from the OCR'd content.

Vision OCR demo

See benchmarks, test documents, and sample output for model comparisons.

</details> <details> <summary><b>One-shot question from OCR'd content</b></summary>

The scanned Star Wars: X-Wing Collector's Edition guide, queried with a single lilbee ask command — no interactive chat needed.

Top speed question

</details>

Standalone

<details> <summary><b>Interactive local offline chat</b></summary>

[!NOTE] Entirely local on a 2021 M1 Pro with 32 GB RAM.

Model switching via tab completion, then a Q&A grounded in an indexed PDF.

Interactive local offline chat

</details> <details> <summary><b>Code index and search</b></summary>

Code search

Add a codebase and search with natural language. Tree-sitter provides AST-aware chunking.

</details> <details> <summary><b>JSON output</b></summary>

JSON output

Structured JSON output for agents and scripts.

</details>

Hardware requirements

When used standalone, lilbee runs entirely on your machine — chat with your documents privately, no cloud required.

| Resource | Minimum | Recommended | |----------|---------|-------------| | RAM | 8 GB | 16–32 GB | | GPU / Accelerator | — | Apple Metal (M-series), NVIDIA GPU (6+ GB VRAM) | | Disk | 2 GB (models + data) | 10+ GB if using multiple models | | CPU | Any modern x86_64 / ARM64 | — |

Ollama handles inference and uses Metal on macOS or CUDA on Linux/Windows. Without a GPU, models fall back to CPU — usable for embedding but slow for chat.

Install

Prerequisites

  • Python 3.11+
  • [Ollama] — the embedding model (nomic-embed-text) is auto-pulled on first sync. If no chat model is installed, lilbee prompts you to pick and download one.
  • Optional (for scanned PDF/image OCR): Tesseract (brew install tesseract / apt install tesseract-ocr) or an Ollama vision model (recommended for better quality — see vision OCR)

First-time download: If you're new to Ollama, expect the first run to take a while — models are large files that need to be downloaded once. For example, qwen3:8b is ~5 GB and the embedding model nomic-embed-text is ~274 MB. After the initial download, models are cached locally and load in seconds. You can check what you have installed with ollama list.

Install

pip install lilbee        # or: uv tool install lilbee

Development (run from source)

git clone https://github.com/tobocop2/lilbee && cd lilbee
uv sync
uv run lilbee

Quick start

See the usage guide.

Agent integration

lilbee can serve as a local retrieval backend for AI coding agents via MCP or JSON CLI. See docs/agent-integration.md for setup and usage.

HTTP Server

lilbee includes a REST API server so you can integrate document search into any GUI or tool:

lilbee serve                          # start on a random port (written to <data_dir>/server.port)
lilbee serve --port 8080              # or pick a fixed port

Endpoints include /api/search, /api/ask, /api/chat (with streaming SSE variants), /api/sync, /api/add, and /api/models. When the server is running, interactive API docs are available at /schema/redoc. See the API reference for the full OpenAPI schema.

Interactive chat

Running lilbee or lilbee chat enters an interactive REPL with conversation history, streaming responses, and slash commands:

| Command | Description | |---------|-------------| | /status | Show indexed documents and config | | /add [path] | Add a file or directory (tab-completes paths) | | /model [name] | Switch chat model — no args opens a curated picker; with a name, switches directly or prompts to download if not installed (tab-completes installed models) | | /vision [name\|off] | Switch vision OCR model — no args opens a curated picker; with a name, prompts to download if not installed; off disables (tab-completes catalog models) | | /settings | Show all current configuration values | | /set <key> <value> | Change a setting (e.g. /set temperature 0.7) | | /version | Show lilbee version | | /reset | Delete all documents and data (asks for confirmation) | | /help | Show available commands | | /quit | Exit chat |

Slash commands and paths tab-complete. A spinner shows while waiting for the first token from the LLM. Background sync progress appears in the toolbar without interrupting the conversation.

Supported formats

Text extraction powered by [Kreuzberg], code chunking by [tree-sitter]. Structured formats (XML, JSON, CSV) get embedding-friendly preprocessing. This list is not exhaustive — Kreuzberg supports ad

Related Skills

View on GitHub
GitHub Stars9
CategoryDevelopment
Updated4d ago
Forks1

Languages

Python

Security Score

90/100

Audited on Mar 21, 2026

No findings