Dotmd

A markdown knowledgebase search tool combining semantic search, BM25 keyword matching, and knowledge graph traversal with reciprocal rank fusion and cross-encoder reranking. Python-powered, fully embedded (LanceDB + LadybugDB + SQLite).

Generate Convert Improve

Install / Use

/learn @inventivepotter/Dotmd

About this skill

Quality Score

0/100

README

dotMD

A markdown knowledgebase search tool that combines semantic search, keyword search, and knowledge graph traversal for high-accuracy retrieval at zero ongoing cost.

How It Works

dotMD indexes your markdown files using three complementary search strategies:

Semantic Search — Embeds chunks with sentence-transformers and stores vectors in LanceDB for meaning-based retrieval
BM25 Keyword Search — Classic term-frequency scoring via rank-bm25 for exact keyword matching
Knowledge Graph Search — Builds a graph of files, sections, entities, and tags in LadybugDB (an embedded Cypher graph database forked from Kuzu), then traverses connections to find related content

Results are merged using Reciprocal Rank Fusion and optionally reranked with a cross-encoder for maximum precision.

Everything runs locally. No API keys required. No ongoing costs.

Use Cases

Give any AI agent instant access to your notes

Connect dotMD as an MCP server to Claude Code, Cursor, VS Code, or any MCP-compatible agent. Your entire markdown knowledge base becomes searchable context the agent can pull from mid-conversation — no copy-pasting, no uploading files.

Search your personal knowledge base

If you keep learning notes, research summaries, or a digital garden in markdown, dotMD lets you search across all of it with semantic understanding — not just keyword matching. Ask "how does the transformer attention mechanism work" and find your notes even if they never use that exact phrase.

Zero-cost RAG without LLM dependencies

Tools like Mem0 and Cognee use LLMs for indexing and retrieval, which means API costs on every query. dotMD runs entirely locally with open-source models — no API keys, no per-query fees, no data leaving your machine. If you can convert your documents to markdown, you have a fully functional RAG pipeline at zero ongoing cost.

Feed project documentation to coding agents

Index your project's docs, ADRs, runbooks, and design documents. When a coding agent needs context about your architecture decisions or deployment process, it can retrieve the relevant sections directly instead of hallucinating or asking you to explain.

Search across Obsidian / Logseq / Foam vaults

Any markdown-based note-taking tool works out of the box. Point dotMD at your vault directory and get hybrid search (semantic + keyword + graph) across years of accumulated notes, without being locked into any single app's search.

Build a searchable knowledge base from any document format

Convert PDFs, Word docs, Confluence pages, or web articles to markdown (using tools like Pandoc, Docling, or Markitdown), then index them with dotMD. This gives you a private, searchable archive of everything you've read or collected.

Team onboarding and internal knowledge sharing

Index your team's internal documentation, incident postmortem reports, or engineering handbooks. New team members — or AI agents assisting them — can search for answers without digging through wikis or Slack history.

Research and literature review

Keep your paper summaries, reading notes, and annotations in markdown. dotMD's knowledge graph connects entities across documents, so you can discover relationships between concepts, authors, or methods that span your entire research corpus.

Installation

Requires Python 3.12+.

cd backend
pip install -e .

Usage

Index your markdown files

dotmd index /path/to/your/markdown/files

With custom entity types for NER:

dotmd index /path/to/files --entity-types "person,technology,concept,project"

Search

dotmd search "how to deploy to production"

Search modes:

dotmd search "query" --mode hybrid      # All 3 engines (default)
dotmd search "query" --mode semantic    # Vector similarity only
dotmd search "query" --mode bm25       # Keyword matching only
dotmd search "query" --mode graph      # Graph traversal only
dotmd search "query" --no-rerank       # Skip cross-encoder reranking
dotmd search "query" --no-expand       # Skip query expansion
dotmd search "query" --top 5           # Limit results

REST API server

dotmd serve                          # Start on localhost:8000
dotmd serve --host 0.0.0.0 -p 9000  # Custom host and port

MCP server

The MCP server uses stdio transport and is launched by an MCP client (Claude Code, VS Code, Cursor, OpenCode, etc.).

Important: The API server and MCP server cannot run at the same time — they share a graph database that only supports a single connection.

To get the MCP config with absolute paths for your environment, run:

dotmd mcp-config

This outputs JSON you can paste directly into your client's MCP config:

{
  "dotmd": {
    "command": "/absolute/path/to/.venv/bin/dotmd",
    "args": ["mcp"]
  }
}

If your MCP client runs from the project root, you can use a relative path instead:

{
  "dotmd": {
    "command": "./backend/.venv/bin/dotmd",
    "args": ["mcp"]
  }
}

Docker

# Build
docker compose build

# Index your files (place markdown in ./data/)
docker compose run api index /data

# Start the API server
docker compose up api

# Rebuild after code changes
docker compose up api --build

Index management

dotmd status    # Show index statistics
dotmd clear     # Delete the entire index

Architecture

backend/src/dotmd/
├── core/          # Domain models (Pydantic), config, exceptions
├── ingestion/     # File discovery, markdown chunking, indexing pipeline
├── extraction/    # Entity/relation extraction (structural + GLiNER NER)
├── storage/       # LanceDB (vectors), LadybugDB (graph), SQLite (metadata)
├── search/        # Semantic, BM25, graph search, RRF fusion, reranking
├── api/           # DotMDService — UI-agnostic public API
├── utils/         # Shared utilities
├── mcp_server.py  # MCP server (FastMCP, stdio transport)
└── cli.py         # Click CLI

MCP Tools

The MCP server (mcp_server.py) exposes three tools via FastMCP:

| Tool | Description | |------|-------------| | search | Query the indexed knowledgebase (supports semantic, BM25, graph, or hybrid mode with optional cross-encoder reranking) | | index | Index all markdown files in a directory | | status | Get current index statistics |

The server uses a lazy singleton DotMDService — ML models load once on first request and are reused across all subsequent calls.

Storage

| Layer | Engine | Details | |-------|--------|---------| | Vector | LanceDB | Embedded, file-based, ANN search | | Graph | LadybugDB | Embedded, Cypher queries, zero-config | | Metadata | SQLite | Chunk text, headings, stats |

All storage is local at ~/.dotmd/.

Graph Schema

Nodes: File, Section, Entity, Tag

Edges: HAS_SECTION, PARENT_OF, LINKS_TO, HAS_TAG, MENTIONS, CO_OCCURS

Entities are extracted in two configurable layers:

Structural (always on) — headings, wikilinks, tags, frontmatter, markdown links
NER (default) — GLiNER zero-shot named entity recognition with customizable entity types

Search Pipeline

query → expand → [semantic, BM25, graph] → RRF fusion → cross-encoder rerank → results

Configuration

Environment variables (prefix DOTMD_):

| Variable | Default | Description | |----------|---------|-------------| | DOTMD_INDEX_DIR | ~/.dotmd | Where index data is stored | | DOTMD_EXTRACT_DEPTH | ner | structural or ner | | DOTMD_EMBEDDING_MODEL | BAAI/bge-small-en-v1.5 | Sentence-transformer model | | DOTMD_NER_ENTITY_TYPES | person,organization,technology,concept,location,object,activity,date_time | GLiNER entity types | | DOTMD_DEFAULT_TOP_K | 10 | Default number of results |

License

MIT

Related Skills

feishu-drive

339.3k

things-mac

339.3k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

339.3k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

yu-ai-agent

2.0k

编程导航 2025 年 AI 开发实战新项目，基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus，覆盖 AI 大模型接入、Spring AI 核心特性、Prompt 工程和优化、RAG 检索增强、向量数据库、Tool Calling 工具调用、MCP 模型上下文协议、AI Agent 开发（Manas Java 实现）、Cursor AI 工具等核心知识。用一套教程将程序员必知必会的 AI 技术一网打尽，帮你成为 AI 时代企业的香饽饽，给你的简历和求职大幅增加竞争力。

inventivepotter

View profile

View on GitHub

GitHub Stars26

CategoryData

Updated7d ago

Forks5

inventivepotter/dotmd

Languages

Python

Security Score

75/100

Audited on Mar 21, 2026

No findings