Cognisync

Cognisync is a filesystem-first framework for building LLM-maintained knowledge bases.

It turns the workflow described by Andrej Karpathy into a reusable open source system:

Collect raw source material into a workspace.
Index and normalize that material into a deterministic manifest.
Generate structured work packets for LLM agents to compile a wiki.
Lint the resulting knowledge base for integrity problems.
Answer questions by searching the corpus and rendering outputs back into Markdown, slides, and other artifacts.

The goal is not to replace your favorite model or agent runner. The goal is to provide the workspace model, orchestration contracts, indexing primitives, and output formats that let people build serious tooling around this pattern.

Core Ideas

Filesystem-native: raw/, wiki/, and outputs/ stay readable in tools like Obsidian.
LLM-compatible: the framework produces prompt packets and execution plans for external LLM CLIs.
Incremental: every scan, lint pass, query, and report can be filed back into the workspace.
Deterministic where possible: indexing, search, linting, and report scaffolding work without network access.
Extensible: users can write adapters, renderers, and orchestration layers on top of the core contracts.

Workspace Layout

workspace/
├── AGENTS.md
├── log.md
├── raw/
│   └── ... source documents, repos, datasets, images
├── wiki/
│   ├── index.md
│   ├── sources.md
│   ├── concepts.md
│   ├── queries.md
│   ├── sources/
│   ├── concepts/
│   └── queries/
├── outputs/
│   ├── reports/
│   │   ├── change-summaries/
│   │   ├── exports/
│   │   ├── research-jobs/
│   │   ├── review-exports/
│   │   └── review-ui/
│   └── slides/
├── prompts/
└── .cognisync/
    ├── access.json
    ├── audit.json
    ├── collaboration.json
    ├── config.json
    ├── control-plane.json
    ├── graph.json
    ├── index.json
    ├── notifications.json
    ├── review-actions.json
    ├── review-queue.json
    ├── runs/
    ├── shared-workspace.json
    ├── sync/
    ├── sources.json
    ├── usage.json
    └── plans/

What Ships In This Reference Implementation

Workspace scaffolding
A root AGENTS.md workspace schema that explains the file-native contract to agents
A root log.md activity ledger that records init, ingest, lint, compile, research, and maintenance work
Deterministic corpus scanner and manifest builder
Stable source and graph manifests under .cognisync/
Stable review queue manifests for graph follow-up work under .cognisync/
Durable review-action state so accepted concepts, merge decisions, and dismissals survive rescans
Durable collaboration threads under .cognisync/collaboration.json so artifact review requests, comments, approvals, and change requests travel with the workspace
Durable shared-workspace state under .cognisync/shared-workspace.json so peer bindings, accepted remote principals, and handoff bundles stay file-native too
Durable control-plane state under .cognisync/control-plane.json so invites, bearer tokens, and scheduler ticks stay file-native too
Regenerated wiki navigation catalogs at wiki/index.md, wiki/sources.md, wiki/concepts.md, and wiki/queries.md
Deterministic corpus change summaries after scan, ingest, maintenance, and research runs
Export bridges for JSONL research datasets, training bundles, and presentation bundles
Evaluation reports over persisted research runs
Research job notes and validation reports under outputs/reports/research-jobs/
Markdown-aware search over raw/ and wiki/
Compile planner for missing summaries, concept pages, and repair work
Knowledge-base linter for broken links, missing summaries, graph conflicts, and duplicate concepts
Markdown and Marp report renderers
Research and compile run manifests with persisted validation state
Command adapter contracts for wiring in external LLM CLIs
A tested Python API and CLI

Quickstart

python3 -m pip install -e .
cognisync init .
cognisync doctor --strict
cognisync ingest batch sources.json
cognisync adapter list
cognisync adapter install codex --profile codex
cognisync compile --profile codex --strict
cognisync research "what are the main themes in this workspace?" --profile codex --mode memo --slides

Try The Demo

If you want a concrete workspace immediately, Cognisync can scaffold a polished demo garden:

cognisync demo

By default this writes a browsable example into examples/research-garden/. The demo includes:

seeded raw source material
compiled source summaries and concept pages
a filed query page
generated reports, slides, and prompt packets

You can inspect the checked-in example in examples/research-garden or follow the walkthrough in Demo Walkthrough.

Operator Workflow

Cognisync is strongest when you use it as a loop, not a bag of separate commands:

cognisync doctor --strict
cognisync ingest batch sources.json
cognisync review
cognisync collab request-review outputs/reports/report.md --assign reviewer-1 --actor-id editor-1
cognisync maintain
cognisync compile --profile codex --strict
cognisync research "what changed in this corpus?" --profile codex --slides

The operator-facing workflow is documented in Operator Workflows.

Each scan, ingest, maintenance, and research pass now also writes a small change artifact into outputs/reports/change-summaries/ so the workspace records what moved:

artifact and source count deltas
orphan-page delta
graph node and edge deltas
new concept pages
newly resolved merges
newly dismissed review items
newly surfaced conflicts
suggested follow-up questions based on new conflicts, assertion growth, and coverage gaps

The workspace root now carries two operator-facing files inspired by the idea-file workflow:

AGENTS.md is the durable workspace schema that tells an LLM how to treat raw/, wiki/, outputs/, and .cognisync/
log.md is an append-only human-readable timeline of important workspace actions

The wiki root also regenerates four navigation surfaces on refresh:

wiki/index.md is the top-level agent entry point
wiki/sources.md, wiki/concepts.md, and wiki/queries.md catalog the durable pages in each section
source and concept catalogs count as stable navigation backlinks
query catalogs only become backlink-bearing when an explicit review action promotes them, so query pages can still surface as orphan review candidates until they are intentionally filed

The richer ingest layer now makes the loop more useful before an LLM even runs:

ingest pdf preserves the source PDF and writes a sidecar Markdown file with extracted text and metadata
ingest url captures page metadata such as description, canonical URL, headings, discovered links, content stats, and local image captures
ingest repo captures repository stats, language signals, recent commits, and a nested tree snapshot in the repo manifest, whether the source is local or cloned from a remote Git URL
ingest urls reads a plain-text or JSON URL list into raw/urls/
ingest sitemap expands a sitemap into individual URL captures
ingest batch processes a JSON manifest so larger source sets can land in one deterministic pass, including URL lists and sitemaps

Batch ingest accepts a JSON list or an object with an items list:

{
  "items": [
    {"kind": "url", "source": "https://example.com/article"},
    {"kind": "urls", "source": "/path/to/urls.txt"},
    {"kind": "sitemap", "source": "/path/to/sitemap.xml"},
    {"kind": "pdf", "source": "/path/to/paper.pdf"},
    {"kind": "repo", "source": "https://github.com/example/repo.git"}
  ]
}

The query and research outputs are now more citation-friendly by default:

reports render an evidence summary with inline source ids like [S1]
reports now render Fact Blocks that separate source-backed claims from the looser narrative sections
source blocks include path, source kind, score, retrieval reason, snippet, and embedded-image hints
compile packets include input-context excerpts so external agents see richer raw context up front
research runs validate inline citations and persist their status into .cognisync/runs/
scans now materialize stable source, graph, and review manifests at .cognisync/sources.json, .cognisync/graph.json, and .cognisync/review-queue.json

Research Command

cognisync research is the opinionated operator surface for question-driven work:

cognisync research "how do agent loops use memory?" --profile claude --mode memo --slides

It scans the workspace, searches the corpus, renders a cited report, builds a prompt packet, optionally runs the packet through an adapter profile, validates inline citations, and files the resulting answer back into the workspace.

Every research run now also writes:

a research plan in .cognisync/plans/
a run manifest in .cognisync/runs/
a research job workspace in outputs/reports/research-jobs/
a research change summary in outputs/reports/change-summaries/
enough state to resume execution later without rebuilding the packet

Research now supports orchestration profiles too:

synthesis-report for working-set and outline-driven synthesis
literature-review for paper matrices and gap tracking
repo-analysis for code-surface and interface mapping
contradiction-finding for claim ledgers and disagreement handling
market-scan for competitor-grid and positioning work

Research verification is now stricter too:

unknown citations fail the run
uncited narrative claims fail the run
malformed answers, such as missing top-level headings, fail the run

Cognisync

Install / Use

README

Cognisync

Core Ideas

Workspace Layout

What Ships In This Reference Implementation

Quickstart

Try The Demo

Operator Workflow

Research Command