Ocrbase
📄 PDF ->.MD/.JSON API & SDK for PaddleOCR-VL with structured data extraction. Self-hostable.
Install / Use
/learn @ocrbase-hq/OcrbaseREADME
ocrbase
Turn PDFs into structured data at scale. Powered by frontier open-weight OCR models.
Quickstart
Try the Playground
ocrbase.dev parse and extract data from documents up to 1K page for free.
Cloud API
-
Generate an API key at ocrbase.dev
-
Add it to your
.envfile:OCRBASE_API_KEY=sk_xxx -
Install the SDK:
npm install ocrbase-sdk -
Parse a document:
import { parse } from "ocrbase-sdk"; const { text } = await parse("./invoice.pdf"); console.log(text);
Or use the API directly with curl:
curl -X POST https://api.ocrbase.dev/v1/parse \
-H "Authorization: Bearer sk_xxx" \
-F "file=@document.pdf"
Self-host
Prerequisites: Bun, Docker Desktop
git clone https://github.com/majcheradam/ocrbase
cd ocrbase
bun install
cp .env.example .env # then edit .env — set PADDLE_OCR_URL to your PaddleOCR instance
docker compose up -d # starts postgres, redis, minio
bun run db:push # set up the database
bun run dev # start the API server + worker
The API will be available at http://localhost:3000. See the Self-Hosting Guide for PaddleOCR setup, GPU configuration, and all environment variables.
How It Works
ocrbase has two core operations. Both are asynchronous — you submit a request, get a job ID, and retrieve the result when it's ready.
Parse (POST /v1/parse)
Converts a PDF into Markdown. Upload a file and ocrbase OCRs every page and returns clean Markdown text.
curl -X POST https://api.ocrbase.dev/v1/parse \
-H "Authorization: Bearer sk_xxx" \
-F "file=@document.pdf"
Extract (POST /v1/extract)
Converts a PDF into structured JSON. You provide a file and a schema ID, and ocrbase OCRs the document then uses an LLM to extract data matching your schema.
curl -X POST https://api.ocrbase.dev/v1/extract \
-H "Authorization: Bearer sk_xxx" \
-F "file=@invoice.pdf" \
-F "schemaId=inv_schema_123"
Checking Results
Polling — fetch the job status until it completes:
curl https://api.ocrbase.dev/v1/jobs/job_xxx \
-H "Authorization: Bearer sk_xxx"
WebSocket — subscribe to real-time status updates instead of polling:
wscat -c "wss://api.ocrbase.dev/v1/realtime?job_id=job_xxx" \
-H "Authorization: Bearer sk_xxx"
Features
- Best-in-class OCR — uses PaddleOCR-VL-1.5 0.9B for accurate text extraction from PDFs
- Structured extraction — define a JSON schema and get structured data back from any document
- Built for scale — queue-based job processing with BullMQ so you can process thousands of documents
- Real-time updates — WebSocket notifications for job progress instead of polling
- Self-hostable — run the entire stack on your own infrastructure with Docker
SDK
Install the TypeScript SDK from npm:
npm install ocrbase-sdk
ocrbase-sdk on npm | Source on GitHub
The SDK provides type-safe methods for parsing, extraction, schema management, and real-time WebSocket subscriptions.
API Reference
- Interactive OpenAPI UI:
https://api.ocrbase.dev/openapi - OpenAPI JSON:
https://api.ocrbase.dev/openapi/json
LLM Integration
Parse documents with ocrbase before sending to LLMs. Raw PDF binary wastes tokens and produces poor results — sending clean Markdown from ocrbase gives much better LLM output at a fraction of the cost.
Architecture
Tech Stack
| Layer | Technology | | ------------- | ------------------------------------------------------------- | | Runtime | Bun | | API Framework | Elysia | | SDK | Eden Treaty | | Database | PostgreSQL + Drizzle ORM | | Queue | Redis + BullMQ | | Storage | S3/MinIO | | OCR | PaddleOCR-VL 1.5 | | Auth | Better-Auth | | Build | Turborepo |
Self-Hosting
See the Self-Hosting Guide for the full deployment walkthrough including PaddleOCR setup, all environment variables, and API endpoint reference.
Requirements: Bun, Docker Desktop
Health Checks
GET /v1/health/live— liveness checkGET /v1/health/ready— readiness check (confirms all dependencies are connected)
Star History
License
MIT — See LICENSE for details.
Contact
For API access, on-premise deployment, or questions: adammajcher20@gmail.com
Related Skills
canvas
351.8kCanvas Skill Display HTML content on connected OpenClaw nodes (Mac app, iOS, Android). Overview The canvas tool lets you present web content on any connected node's canvas view. Great for: -
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
Writing Hookify Rules
110.9kThis skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.
