SkillAgentSearch skills...

PaperCortex

AI-Powered Document Intelligence for Paperless-ngx — Semantic search, auto-classification, receipt extraction, DATEV export. 100% local with Ollama. MCP Server for Claude Code.

Install / Use

/learn @renefichtmueller/PaperCortex

README

<p align="center"> <img src="docs/assets/papercortex-logo.svg" alt="PaperCortex Logo" width="120" /> <h1 align="center">PaperCortex</h1> <p align="center"> <strong>AI-Powered Document Intelligence for Paperless-ngx</strong><br/> <em>Semantic search, auto-classification, receipt extraction, and accounting export — 100% local, 100% private.</em> </p> <p align="center"> <a href="#-quick-start"><img src="https://img.shields.io/badge/Docker-one--command-2496ED?logo=docker&logoColor=white" alt="Docker"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-22c55e.svg" alt="MIT License"></a> <img src="https://img.shields.io/badge/TypeScript-5.x-3178C6?logo=typescript&logoColor=white" alt="TypeScript"> <img src="https://img.shields.io/badge/Ollama-Local_AI-7C3AED?logo=ollama&logoColor=white" alt="Ollama"> <img src="https://img.shields.io/badge/MCP-Server-F97316" alt="MCP Server"> <img src="https://img.shields.io/badge/Paperless--ngx-Compatible-EF4444?logo=data:image/svg+xml;base64,..." alt="Paperless-ngx"> <img src="https://img.shields.io/badge/DATEV-Export-EAB308" alt="DATEV Export"> <img src="https://img.shields.io/badge/Privacy-First-10B981" alt="Privacy First"> </p> <p align="center"> <a href="#-quick-start">Quick Start</a> · <a href="#-features">Features</a> · <a href="#-mcp-server-tools">MCP Tools</a> · <a href="#-receipt-intelligence">Receipts</a> · <a href="#-documentation">Docs</a> </p> </p> <p align="center"> <img src="docs/assets/demo.svg" alt="PaperCortex Demo" width="700" /> </p>

What is PaperCortex?

PaperCortex turns your Paperless-ngx document archive into an intelligent, queryable knowledge base — powered entirely by local AI running on your own hardware.

If you use Paperless-ngx to store invoices, receipts, contracts, tax documents, letters, or any other scanned paperwork, PaperCortex adds the intelligence layer that Paperless-ngx is missing:

  • Ask questions in plain English — "Show me all invoices from Amazon over 100 EUR in 2025"
  • Find documents by meaning, not just keywords — searching for "office rent" finds "Bueromiete" and "monthly lease payment"
  • Auto-tag and classify every new document the moment it arrives
  • Extract structured data from receipts — vendor, date, amount, tax rate, line items
  • Match receipts to bank transactions automatically
  • Export to DATEV for your German tax advisor — or plain CSV for any accounting software

Everything runs locally through Ollama. No document content ever leaves your network. No cloud APIs. No subscriptions. No data harvesting.

PaperCortex exposes all capabilities as an MCP (Model Context Protocol) Server, making it a first-class tool for Claude Code, AI coding agents, and automated workflows.


The Problem

Paperless-ngx is an outstanding document management system with 37,000+ GitHub stars. It handles scanning, OCR, storage, and basic tagging beautifully. But once your documents are in Paperless-ngx, finding and working with them has real limitations:

| What you want to do | Paperless-ngx alone | With PaperCortex | |---|---|---| | Find a document by what it's about | Keyword search only — misses synonyms, translations, related concepts | Semantic search understands meaning across languages | | Classify incoming documents | Manual rules or basic auto-matching | LLM-powered classification understands document content | | Extract data from a receipt | Read it yourself and type it in | Automatic extraction of vendor, amount, date, tax, line items | | Answer "How much did I spend on X?" | Export everything, open spreadsheet, filter manually | Natural language query returns the answer instantly | | Send receipt data to accounting | Manual data entry or copy-paste | One-click DATEV/CSV export ready for your tax advisor | | Use documents in AI workflows | No API integration for AI agents | Full MCP Server for Claude Code and any MCP-compatible agent | | Keep data private | Self-hosted (good!) | Self-hosted AI too — zero cloud dependency |


Features

Semantic Document Search

Traditional keyword search fails when you don't remember the exact words. PaperCortex generates vector embeddings for every document using local Ollama models and stores them in a lightweight SQLite vector database.

Search by meaning, not by memory:

  • Search for "electricity bill" → finds documents containing "Stromrechnung", "utility payment", "power invoice"
  • Search for "office supplies" → finds "Bueroausstattung", "paper and toner", "desk accessories order"
  • Search for "tax deductible travel" → finds flight bookings, hotel receipts, train tickets, taxi invoices

Supported embedding models:

  • nomic-embed-text (recommended — fast, accurate, 768 dimensions)
  • mxbai-embed-large (higher accuracy, slower)
  • Any Ollama-compatible embedding model

Automatic Document Classification

Every new document arriving in Paperless-ngx gets analyzed by a local LLM that reads the OCR content and assigns:

  • Document type — Invoice, Receipt, Contract, Letter, Statement, Tax Document, Certificate
  • Tags — Contextual tags based on content (e.g., "office", "travel", "insurance", "subscription")
  • Correspondent — Identifies the sender/vendor from document content
  • Date extraction — Finds the document date (not just the scan date)
  • Language detection — Identifies the document language

Classification runs asynchronously in the background. New documents are processed within minutes of arriving in Paperless-ngx.

Receipt Intelligence

PaperCortex includes a dedicated receipt processing pipeline optimized for expense management:

Data extraction from receipts and invoices:

  • Vendor / merchant name and address
  • Date of purchase
  • Total amount (gross and net)
  • Tax rate and tax amount (supports multiple VAT rates)
  • Currency
  • Individual line items with quantities and prices
  • Payment method
  • Invoice/receipt number

Works with:

  • Scanned paper receipts (via Paperless-ngx OCR)
  • Digital PDF invoices
  • Photographed receipts (mobile upload to Paperless-ngx)
  • Multi-page invoices
  • Receipts in German, English, French, Spanish, and other languages

Bank Statement Matching

Import your bank statement as CSV and let PaperCortex automatically match transactions to receipts:

  • Fuzzy matching on amount, date, and vendor name
  • Confidence scoring — high/medium/low match indicators
  • Unmatched detection — highlights receipts without matching transactions and vice versa
  • Multi-currency support — handles EUR, USD, GBP, CHF, and 20+ currencies

DATEV Export

For German businesses and freelancers, PaperCortex generates DATEV-compatible export files that your Steuerberater can import directly:

  • DATEV CSV format (Buchungsstapel) — the standard German accounting import format
  • SKR03 / SKR04 account mapping
  • Automatic account assignment based on document classification
  • Beleglink — links each DATEV entry back to the original document in Paperless-ngx
  • Period exports — monthly, quarterly, or annual

Also supports plain CSV export for use with any accounting software worldwide.

Natural Language Queries

Ask questions about your document archive in plain language:

"How much did I spend on hotels in Q1 2025?"
"Show me all contracts expiring this year"
"What was my highest single expense last month?"
"Find all invoices from Deutsche Telekom"
"Which receipts don't have a matching bank transaction?"
"Summarize my office supply spending trend over the last 12 months"

PaperCortex translates natural language into document queries, retrieves relevant documents via semantic search, and uses the local LLM to synthesize answers with source references.

MCP Server Integration

PaperCortex implements the Model Context Protocol (MCP) — the open standard for connecting AI agents to external tools. This means any MCP-compatible AI agent can use your document archive as a knowledge source.

Compatible with:


Feature Comparison

| Feature | PaperCortex | paperless-ai | Veryfi | Taggun | Rossum | |---|:---:|:---:|:---:|:---:|:---:| | Fully self-hosted | :white_check_mark: | :white_check_mark: | :x: | :x: | :x: | | Local AI (no cloud API) | :white_check_mark: | :x: OpenAI | :x: | :x: | :x: | | Semantic search | :white_check_mark: | :x: | :x: | :x: | :x: | | Auto-classification | :white_check_mark: | :white_check_mark: | :x: | :x: | :white_check_mark: | | Receipt data extraction | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | Bank statement matching | :white_check_mark: | :x: | :x: | :x: | :x: | | DATEV export | :white_check_mark: | :x: | :x: | :x: | :x: | | CSV accounting export | :white_check_mark: | :x: | :white_check_mark: | :x: | :white_check_mark: | | MCP Server | :white_check_mark: | :x: | :x: | :x: | :x: | | Natural language queries | :white_check_mark: | :x: | :x: | :x: | :x: | | Multi-language documents | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | Free and open source | :white_check_mark: | :white_check_mark: | :x: $$$ | :x: $$$ | :x: $$$$ | | Privacy — data stays local | :white_check_mark: | :warning: API calls | :x: | :x: | :x: | | Works with Paperless-ngx | :white_check_mark: | :white_check_mark: | :x: | :x: | :x: |


Architecture

┌─────────────────────┐         ┌──────────────────────────┐         ┌────────────────────┐
│        
View on GitHub
GitHub Stars3
CategoryDevelopment
Updated10h ago
Forks1

Languages

TypeScript

Security Score

90/100

Audited on Mar 29, 2026

No findings