Cognisync
Filesystem-first framework for LLM-maintained knowledge bases
Install / Use
/learn @shrijacked/CognisyncREADME
Cognisync
Cognisync is a filesystem-first framework for building LLM-maintained knowledge bases.
It turns the workflow described by Andrej Karpathy into a reusable open source system:
- Collect raw source material into a workspace.
- Index and normalize that material into a deterministic manifest.
- Generate structured work packets for LLM agents to compile a wiki.
- Lint the resulting knowledge base for integrity problems.
- Answer questions by searching the corpus and rendering outputs back into Markdown, slides, and other artifacts.
The goal is not to replace your favorite model or agent runner. The goal is to provide the workspace model, orchestration contracts, indexing primitives, and output formats that let people build serious tooling around this pattern.
Core Ideas
- Filesystem-native:
raw/,wiki/, andoutputs/stay readable in tools like Obsidian. - LLM-compatible: the framework produces prompt packets and execution plans for external LLM CLIs.
- Incremental: every scan, lint pass, query, and report can be filed back into the workspace.
- Deterministic where possible: indexing, search, linting, and report scaffolding work without network access.
- Extensible: users can write adapters, renderers, and orchestration layers on top of the core contracts.
Workspace Layout
workspace/
├── AGENTS.md
├── log.md
├── raw/
│ └── ... source documents, repos, datasets, images
├── wiki/
│ ├── index.md
│ ├── sources.md
│ ├── concepts.md
│ ├── queries.md
│ ├── sources/
│ ├── concepts/
│ └── queries/
├── outputs/
│ ├── reports/
│ │ ├── change-summaries/
│ │ ├── exports/
│ │ ├── research-jobs/
│ │ ├── review-exports/
│ │ └── review-ui/
│ └── slides/
├── prompts/
└── .cognisync/
├── access.json
├── audit.json
├── collaboration.json
├── config.json
├── control-plane.json
├── graph.json
├── index.json
├── notifications.json
├── review-actions.json
├── review-queue.json
├── runs/
├── shared-workspace.json
├── sync/
├── sources.json
├── usage.json
└── plans/
What Ships In This Reference Implementation
- Workspace scaffolding
- A root
AGENTS.mdworkspace schema that explains the file-native contract to agents - A root
log.mdactivity ledger that records init, ingest, lint, compile, research, and maintenance work - Deterministic corpus scanner and manifest builder
- Stable source and graph manifests under
.cognisync/ - Stable review queue manifests for graph follow-up work under
.cognisync/ - Durable review-action state so accepted concepts, merge decisions, and dismissals survive rescans
- Durable collaboration threads under
.cognisync/collaboration.jsonso artifact review requests, comments, approvals, and change requests travel with the workspace - Durable shared-workspace state under
.cognisync/shared-workspace.jsonso peer bindings, accepted remote principals, and handoff bundles stay file-native too - Durable control-plane state under
.cognisync/control-plane.jsonso invites, bearer tokens, and scheduler ticks stay file-native too - Regenerated wiki navigation catalogs at
wiki/index.md,wiki/sources.md,wiki/concepts.md, andwiki/queries.md - Deterministic corpus change summaries after scan, ingest, maintenance, and research runs
- Export bridges for JSONL research datasets, training bundles, and presentation bundles
- Evaluation reports over persisted research runs
- Research job notes and validation reports under
outputs/reports/research-jobs/ - Markdown-aware search over
raw/andwiki/ - Compile planner for missing summaries, concept pages, and repair work
- Knowledge-base linter for broken links, missing summaries, graph conflicts, and duplicate concepts
- Markdown and Marp report renderers
- Research and compile run manifests with persisted validation state
- Command adapter contracts for wiring in external LLM CLIs
- A tested Python API and CLI
Quickstart
python3 -m pip install -e .
cognisync init .
cognisync doctor --strict
cognisync ingest batch sources.json
cognisync adapter list
cognisync adapter install codex --profile codex
cognisync compile --profile codex --strict
cognisync research "what are the main themes in this workspace?" --profile codex --mode memo --slides
Try The Demo
If you want a concrete workspace immediately, Cognisync can scaffold a polished demo garden:
cognisync demo
By default this writes a browsable example into examples/research-garden/. The demo includes:
- seeded raw source material
- compiled source summaries and concept pages
- a filed query page
- generated reports, slides, and prompt packets
You can inspect the checked-in example in examples/research-garden or follow the walkthrough in Demo Walkthrough.
Operator Workflow
Cognisync is strongest when you use it as a loop, not a bag of separate commands:
cognisync doctor --strict
cognisync ingest batch sources.json
cognisync review
cognisync collab request-review outputs/reports/report.md --assign reviewer-1 --actor-id editor-1
cognisync maintain
cognisync compile --profile codex --strict
cognisync research "what changed in this corpus?" --profile codex --slides
The operator-facing workflow is documented in Operator Workflows.
Each scan, ingest, maintenance, and research pass now also writes a small change artifact into outputs/reports/change-summaries/ so the workspace records what moved:
- artifact and source count deltas
- orphan-page delta
- graph node and edge deltas
- new concept pages
- newly resolved merges
- newly dismissed review items
- newly surfaced conflicts
- suggested follow-up questions based on new conflicts, assertion growth, and coverage gaps
The workspace root now carries two operator-facing files inspired by the idea-file workflow:
AGENTS.mdis the durable workspace schema that tells an LLM how to treatraw/,wiki/,outputs/, and.cognisync/log.mdis an append-only human-readable timeline of important workspace actions
The wiki root also regenerates four navigation surfaces on refresh:
wiki/index.mdis the top-level agent entry pointwiki/sources.md,wiki/concepts.md, andwiki/queries.mdcatalog the durable pages in each section- source and concept catalogs count as stable navigation backlinks
- query catalogs only become backlink-bearing when an explicit review action promotes them, so query pages can still surface as orphan review candidates until they are intentionally filed
The richer ingest layer now makes the loop more useful before an LLM even runs:
ingest pdfpreserves the source PDF and writes a sidecar Markdown file with extracted text and metadataingest urlcaptures page metadata such as description, canonical URL, headings, discovered links, content stats, and local image capturesingest repocaptures repository stats, language signals, recent commits, and a nested tree snapshot in the repo manifest, whether the source is local or cloned from a remote Git URLingest urlsreads a plain-text or JSON URL list intoraw/urls/ingest sitemapexpands a sitemap into individual URL capturesingest batchprocesses a JSON manifest so larger source sets can land in one deterministic pass, including URL lists and sitemaps
Batch ingest accepts a JSON list or an object with an items list:
{
"items": [
{"kind": "url", "source": "https://example.com/article"},
{"kind": "urls", "source": "/path/to/urls.txt"},
{"kind": "sitemap", "source": "/path/to/sitemap.xml"},
{"kind": "pdf", "source": "/path/to/paper.pdf"},
{"kind": "repo", "source": "https://github.com/example/repo.git"}
]
}
The query and research outputs are now more citation-friendly by default:
- reports render an evidence summary with inline source ids like
[S1] - reports now render
Fact Blocksthat separate source-backed claims from the looser narrative sections - source blocks include path, source kind, score, retrieval reason, snippet, and embedded-image hints
- compile packets include input-context excerpts so external agents see richer raw context up front
- research runs validate inline citations and persist their status into
.cognisync/runs/ - scans now materialize stable source, graph, and review manifests at
.cognisync/sources.json,.cognisync/graph.json, and.cognisync/review-queue.json
Research Command
cognisync research is the opinionated operator surface for question-driven work:
cognisync research "how do agent loops use memory?" --profile claude --mode memo --slides
It scans the workspace, searches the corpus, renders a cited report, builds a prompt packet, optionally runs the packet through an adapter profile, validates inline citations, and files the resulting answer back into the workspace.
Every research run now also writes:
- a research plan in
.cognisync/plans/ - a run manifest in
.cognisync/runs/ - a research job workspace in
outputs/reports/research-jobs/ - a research change summary in
outputs/reports/change-summaries/ - enough state to resume execution later without rebuilding the packet
Research now supports orchestration profiles too:
synthesis-reportfor working-set and outline-driven synthesisliterature-reviewfor paper matrices and gap trackingrepo-analysisfor code-surface and interface mappingcontradiction-findingfor claim ledgers and disagreement handlingmarket-scanfor competitor-grid and positioning work
Research verification is now stricter too:
- unknown citations fail the run
- uncited narrative claims fail the run
- malformed answers, such as missing top-level headings, fail the run
