Yore
Yore is a high-performance, deterministic retrieval and context-assembly engine designed for large documentation sets and agentic workflows.
Install / Use
/learn @rahulrajaram/YoreREADME
yore – Deterministic Documentation Indexer and Context Assembly Engine
yore is a fast, deterministic tool for indexing, analyzing, and retrieving documentation from a filesystem and assembling that information into high‑quality context for large language models (LLMs) and automation agents.
Where traditional search tools return a list of matching files, yore is designed to answer a more specific question:
“Given this question and a fixed token budget, what exact slice of the documentation should an LLM see to reason correctly and safely?”
Yore combines BM25 search, structural analysis, link graph inspection, duplicate detection, and extractive refinement into a reproducible pipeline that can be used directly by humans or programmatically by agents.
1. Concepts and Terminology
Before diving into commands, it helps to define a few terms that appear throughout this README.
Documentation sprawl
“Documentation sprawl” refers to the way documentation accumulates over time:
- Multiple files describe the same feature with slightly different details.
- Older documents are left in the tree and never removed or clearly marked as deprecated.
- Temporary or scratch notes are committed and live alongside canonical documentation.
- Engineers searching for “authentication” might see ten files with overlapping names and no clear indication of which one is authoritative.
Yore is designed to operate in exactly this environment and make it tractable for both humans and LLMs.
Architecture Decision Record (ADR) and ADR chain
An ADR (Architecture Decision Record) is a small document that records a single architectural decision: the context, the decision itself, and the consequences. Projects often store ADRs under a directory such as docs/adr/ADR-0001-some-decision.md.
An ADR chain is the sequence of ADRs that refer to one another over time, for example:
ADR-0013introduces retry semantics.ADR-0022modifies the retry timing.ADR-0035deprecates a previous approach.
LLMs frequently need this historical context to answer “why” questions correctly. Yore is able to recognize ADR references (for example, ADR-013) and pull those records into the context it assembles.
Canonical document
In a large repository, several documents may cover similar topics. A canonical document is the one that should be treated as the primary source of truth for a topic.
Yore computes a canonicality score per document based on path, naming conventions, recency, and other signals, and exposes those scores so tools and agents can make consistent, automated decisions.
2. What Yore Does
At a high level, yore provides:
- Indexing of documentation files (Markdown, text, etc.) using BM25 and structural metadata.
- Search and analysis over that index: free‑text search, duplicate detection, canonicality scoring, link graph queries.
- Context assembly for LLMs, including cross‑reference expansion and extractive refinement controlled by an explicit token budget.
- Bounded agent retrieval, with JSON-first preview/fetch flows that keep large context off transcript until explicitly requested.
- Quality checks, such as link validation and an evaluation harness for retrieval correctness.
Some example questions Yore helps answer:
- “Which documents describe Kubernetes deployment, and which one is canonical?”
- “What ADRs exist for authentication and session isolation?”
- “What documents are unreferenced and safe to clean up?”
- “What is the smallest, highest‑signal context I can give an LLM for ‘How do I deploy a new service?’ within 8,000 tokens?”
3. How Yore Differs from Traditional Search Tools
Yore is not a replacement for Lucene, Elasticsearch, Meilisearch, or ripgrep. Instead, it builds on similar primitives and adds additional layers specifically for documentation curation and LLM context assembly.
3.1 Comparison matrix
| Capability / Tool | Yore | Lucene / Tantivy | Elasticsearch / OpenSearch | Meilisearch | ripgrep | |----------------------------------|-------------------------------------|--------------------------------------|--------------------------------------|--------------------------------------|--------------------------------------| | Primary use case | Doc indexing + LLM context assembly | General‑purpose search library | Scalable full‑text search cluster | Simple search API for applications | Fast text search in files | | Retrieval model | BM25 + structural and link signals | BM25 / scoring plugins | BM25 + scoring / aggregations | BM25‑like | Regex / literal matching | | Cross‑reference expansion | Yes (Markdown links, ADR refs) | No (caller must implement) | No (caller must implement) | No | No | | Duplicate detection (docs/sections) | Yes (Jaccard + MinHash + SimHash) | No (custom code required) | No | No | No | | Canonicality scoring | Yes (path, naming, recency signals) | No | No | No | No | | Link graph analysis (backlinks, orphans) | Yes | No | No | No | No | | LLM‑aware token budgeting | Yes (per‑query token budget) | No | No | No | No | | Extractive refinement | Yes (sentence‑level, code‑preserving) | No | No | No | No | | Deterministic output | Yes (no sampling, no embeddings) | Yes | Yes (given same index) | Yes | Yes | | Designed for agent integration | Yes | Caller‑defined | Caller‑defined | Caller‑defined | Caller‑defined |
You can use lucene‑like tools to implement the core search primitive. Yore sits at a higher level, orchestrating retrieval, link following, refinement, and evaluation in a way that is explicitly designed for LLMs and documentation maintenance agents.
4. Architecture Overview
Yore operates in four main phases:
-
Indexing The
yore buildcommand walks a directory tree, identifies documents of interest (for example,*.md), and builds an index that includes:- BM25 term statistics
- Section boundaries and fingerprints
- Link information (Markdown links and ADR references)
- Basic metadata (path, size, timestamps)
-
Retrieval and analysis Commands such as
yore query,yore dupes,yore dupes-sections,yore canonicality,yore canonical-orphans,yore check-links,yore backlinks, andyore orphansoperate against this index to answer questions about relevance, duplication, authority, and link structure. -
Context assembly for LLMs and agents Yore supports two retrieval shapes over the same index:
yore assemblefor a markdown digest you want to hand directly to an LLM.yore mcp search-context/yore mcp fetch-contextfor bounded JSON previews and explicit follow-up expansion.yore mcp serveto expose those same bounded tools over MCP stdio transport for editor and agent clients.
Both paths reuse the same deterministic retrieval building blocks:
- BM25 to select the most relevant documents and sections.
- Cross‑reference expansion to include linked ADRs and design docs where appropriate.
- Extractive refinement to keep code blocks, lists, and high‑value sentences while removing low‑signal prose.
- Final budget-aware trimming for either markdown digests or compact JSON previews.
-
Evaluation and governance The
yore evalcommand uses a JSONL question file to validate whether the assembled contexts contain expected substrings, enabling regression detection and measurable improvements to retrieval quality.
All operations are deterministic: given the same index and configuration, yore will produce the same outputs.
5. Installation
From crates.io (recommended)
cargo install yore-cli
From source
git clone https://github.com/rahulrajaram/yore.git
cd yore
cargo install --path .
Verify installation
yore --version
6. Quick Start
6.1 Build an index
Create an index over Markdown files in docs/:
yore build docs --output docs/.index --types md
6.2 Run a search query
Use BM25‑based search over the index:
yore query kubernetes deployment --index docs/.index
yore query --query '"async migration"' --phrase --index docs/.index
yore query --query '"async migration" plan' --phrase --explain --index docs/.index
yore query --query '"async migration" plan' --phrase --explain --json --index docs/.index
Related Skills
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
2.0kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
ui-ux-pro-max-skill
58.5kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
ui-ux-pro-max-skill
58.5kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
