Polydoc - JVM-Native Pandoc Documentation System

[[https://pandoc.org/][Pandoc]] is a powerful tool for parsing, combining and processing text-based documents. Pandoc also includes a filter system that allows for the transformation of documents and sub-elements.

Polydoc brings Pandoc's filtering capabilities to the JVM/Clojure ecosystem, providing:

Advanced Pandoc filters compiled with GraalVM for reduced latency
JVM-native documentation tooling (no Python dependencies required, JavaScript via GraalJS)
Support for multiple document formats via Pandoc
SQLite-powered full-text search with FTS5
Book building system with automatic indexing
Interactive documentation viewing (coming soon)

** Rationale

Documentation is a fundamental part of software engineering. There are many different tools for managing documentation, but polydoc offers several advantages:

JVM-Native: If you're working on the JVM, you don't need to bring Python or Go into your stack just for documentation. JavaScript execution uses GraalJS (included with GraalVM).
Pandoc Integration: Leverage Pandoc's powerful document transformation capabilities.
Advanced Filters: Execute code (Clojure, SQLite, JavaScript), render diagrams (PlantUML), and compose documents (include filter).
Full-Text Search: Built-in SQLite FTS5 search index automatically maintained.
Book Building: Combine multiple documents into searchable books with metadata.

** Features

*** Implemented ✅

**** Pandoc Filters

clojure-exec: Execute Clojure code blocks and show results
sqlite-exec: Run SQLite queries and format results as tables
javascript-exec: Execute JavaScript code blocks with GraalJS (GraalVM's JavaScript engine)
plantuml: Render PlantUML diagrams to images
include: Compose documents from multiple source files

**** Book Building System

Load configuration from =polydoc.yml= (Pandoc-compatible metadata format)
Process multiple markdown/org files through filters
Extract sections and headers automatically
Store content in SQLite database
Automatic FTS5 full-text search index (via database triggers)
Generate HTML output

**** Search System

FTS5 full-text search across all book content
Boolean operators: AND, OR, NOT
Phrase search: ="exact phrase"=
Field-specific search: =title:introduction=
Result highlighting with context snippets
Result ranking by relevance

**** Interactive Viewer

HTTP-based documentation browser with http-kit
Section navigation (Previous/Next buttons)
Collapsible table of contents
Full-text search interface
Pico CSS styling for clean UI
Browser automation testing with Etaoin/Firefox

*** Coming Soon ⏳

Python and Shell execution filters
PDF and EPUB output formats
GraalVM native compilation
Viewer enhancements (themes, bookmarks, history)

** Installation

*** Prerequisites

GraalVM 21 or later (includes GraalJS for JavaScript filter)
Clojure CLI tools
Pandoc 2.0 or later
PlantUML JAR (for PlantUML filter)

*** Using Clojure CLI

Clone the repository and use directly:

#+BEGIN_SRC bash git clone <repository-url> cd polydoc clojure -M:main --help #+END_SRC

** Usage

*** Command Overview

#+BEGIN_SRC bash polydoc filter # Execute individual Pandoc filters polydoc book # Build complete books from polydoc.yml polydoc search # Search documentation with full-text search polydoc view # Interactive viewer (coming soon) #+END_SRC

*** Filters

Execute individual Pandoc filters on document AST:

**** Clojure Execution Filter

#+BEGIN_SRC bash

Process markdown through Clojure execution filter

pandoc input.md -t json | clojure -M:main filter -t clojure-exec | pandoc -f json -o output.html #+END_SRC

Example markdown: #+BEGIN_SRC markdown

(+ 1 2 3)

#+END_SRC

Output shows code and result: #+BEGIN_SRC markdown

(+ 1 2 3)
;; => 6

#+END_SRC

**** SQLite Execution Filter

#+BEGIN_SRC bash pandoc input.md -t json | clojure -M:main filter -t sqlite-exec | pandoc -f json -o output.html #+END_SRC

Example markdown: #+BEGIN_SRC markdown

SELECT name, age FROM users ORDER BY age DESC LIMIT 5;

#+END_SRC

Output shows query and results as a formatted table.

**** JavaScript Execution Filter

#+BEGIN_SRC bash pandoc input.md -t json | clojure -M:main filter -t javascript-exec | pandoc -f json -o output.html #+END_SRC

Example markdown: #+BEGIN_SRC markdown

const sum = [1, 2, 3, 4, 5].reduce((a, b) => a + b, 0);
console.log("Sum:", sum);

#+END_SRC

**** PlantUML Rendering Filter

#+BEGIN_SRC bash pandoc input.md -t json | clojure -M:main filter -t plantuml | pandoc -f json -o output.html #+END_SRC

Example markdown: #+BEGIN_SRC markdown

@startuml
Alice -> Bob: Hello
Bob -> Alice: Hi there!
@enduml

#+END_SRC

Output replaces code block with rendered diagram image.

**** Include Filter

#+BEGIN_SRC bash pandoc input.md -t json | clojure -M:main filter -t include | pandoc -f json -o output.html #+END_SRC

Example markdown: #+BEGIN_SRC markdown

#+END_SRC

Includes content from external file inline.

*** Book Building

Build complete books with automatic indexing and search:

**** Configuration: polydoc.yml

Create a =polydoc.yml= file with Pandoc-compatible metadata:

#+BEGIN_SRC yaml

Standard Pandoc metadata

title: "My Documentation" author: "Your Name" date: "2025-11-26" lang: "en-US" description: "Comprehensive documentation"

toc: true toc-depth: 3

Polydoc-specific: Book configuration

book: id: "my-docs" version: "1.0.0" database: "docs.db"

Filters to apply during build

filters: - clojure-exec - sqlite-exec - plantuml - include

Document sections (in order)

sections:

docs/introduction.md
docs/tutorial.md
docs/reference.md

Extended format with per-file options

file: docs/advanced.md title: "Advanced Topics" filters:
- clojure-exec
- plantuml

#+END_SRC

**** Build Command

#+BEGIN_SRC bash clojure -M:main book -c polydoc.yml -o output/ #+END_SRC

This will:

Load configuration from =polydoc.yml=
Process each section file through configured filters
Extract headers and content
Insert into SQLite database with FTS5 search index
Generate HTML output in =output/= directory

**** What Gets Created

Database: =docs.db= (or path from config)
- =books= table: Book metadata
- =sections= table: All extracted sections with content
- =sections_fts= table: FTS5 full-text search index (auto-synced via triggers)
HTML Output: =output/my-docs.html=
- Combined document from all sections
- Processed through all configured filters

*** Search

Search your documentation using FTS5 full-text search:

**** Basic Search

#+BEGIN_SRC bash clojure -M:main search -d docs.db -q "pandoc filter" #+END_SRC

**** Search with Operators

#+BEGIN_SRC bash

Boolean AND (default)

clojure -M:main search -d docs.db -q "clojure AND filter"

Boolean OR

clojure -M:main search -d docs.db -q "clojure OR python"

Exclude terms with NOT

clojure -M:main search -d docs.db -q "filter NOT javascript"

Exact phrase search

clojure -M:main search -d docs.db -q '"book building system"'

Field-specific search

clojure -M:main search -d docs.db -q "title:introduction" #+END_SRC

**** Limit Results

#+BEGIN_SRC bash

Show up to 20 results (default: 10)

clojure -M:main search -d docs.db -q "documentation" -l 20 #+END_SRC

**** Filter by Book

#+BEGIN_SRC bash

Search within specific book only

clojure -M:main search -d docs.db -q "query" -b 1 #+END_SRC

** Development

This tool is written in JVM Clojure and designed to be compiled to native code via GraalVM.

*** Development Environment

The Clojure REPL is the primary interface for development. There are two main namespaces: =user= and =dev=.

When you connect to a REPL, you'll be in the =user= namespace. Run =(dev)= to load the =dev= namespace. The =dev= namespace provides functions for development workflow:

#+BEGIN_SRC clojure (refresh) ;; Refresh all namespaces (uses clj-reload) (lint) ;; Lint the project with clj-kondo (run-all) ;; Run all tests #+END_SRC

*** Starting the REPL

#+BEGIN_SRC bash

Start nREPL server on port 7889

clojure -M:jvm-base:dev:nrepl

In another terminal, connect with your editor

Or use clj-nrepl-eval for command-line evaluation:

clj-nrepl-eval -p 7889 "(+ 1 2 3)" #+END_SRC

*** Running Tests

#+BEGIN_SRC bash

Run all tests via Kaocha

clojure -M:dev:test

Run via REPL

(require 'clojure.test) (clojure.test/run-all-tests #"polydoc.*") #+END_SRC

Current status: 106 tests passing, 403 assertions, 0 failures ✅

**** Browser Testing

Polydoc uses [[https://github.com/clj-commons/etaoin][Etaoin]] for browser automation testing with Firefox/GeckoDriver. Browser tests verify the interactive viewer functionality.

Prerequisites for browser tests:

#+BEGIN_SRC bash

macOS (via Homebrew)

brew install --cask firefox brew install geckodriver

Ubuntu/Debian

sudo apt-get install firefox

GeckoDriver - download from GitHub releases

wget https://github.com/mozilla/geckodriver/releases/latest/download/geckodriver-linux64.tar.gz tar -xzf geckodriver-linux64.tar.gz sudo mv geckodriver /usr/local/bin/ #+END_SRC

Running browser tests:

#+BEGIN_SRC bash

Browser tests run as part of the full test suite

clojure -M:dev:test

Run only viewer tests

clojure -M:dev:test --focus polydoc.viewer.server-test

In REPL

(require '[clojure.test :as test]) (test/run-tests 'polydoc.viewer.server-test) #+END_SRC

Browser tests include:

Page loading and navigation
Section browsing (Previous/Next)
Table of contents interaction
Full-text search functionality
Responsive layout verification

*** Linting

#+BEGIN_SRC bash

Via command line

clojure -M:lint -m clj-kondo.main --lint src test

Or in REPL (from dev namespace)

(lint) #+EN

Polydoc

Install / Use

README

Process markdown through Clojure execution filter

#+BEGIN_SRC yaml

Standard Pandoc metadata

Table of contents

Polydoc-specific: Book configuration

Filters to apply during build

Document sections (in order)

Extended format with per-file options

Boolean AND (default)

Boolean OR

Exclude terms with NOT

Exact phrase search

Field-specific search

Show up to 20 results (default: 10)

Search within specific book only

Start nREPL server on port 7889

In another terminal, connect with your editor

Or use clj-nrepl-eval for command-line evaluation:

Run all tests via Kaocha

Run via REPL

macOS (via Homebrew)

Ubuntu/Debian

GeckoDriver - download from GitHub releases

Browser tests run as part of the full test suite

Run only viewer tests

In REPL

Via command line

Or in REPL (from dev namespace)

Related Skills