DocBrain

Self-hosted RAG framework with RBAC, multi-LLM support, and natural language SQL over tabular data

Generate Convert Improve

Install / Use

/learn @shivama205/DocBrain

About this skill

Quality Score

0/100

README

DocBrain-Banner-3

DocBrain

Self-hosted RAG for teams that can't send data to third-party APIs. Built-in RBAC, multi-LLM support, and natural language queries over your tabular data.

<div align="center"> <a href="https://github.com/shivama205/DocBrain/raw/main/docs/assets/docbrain-demo.mp4"> <img src="docs/assets/docbrain-demo.gif" alt="DocBrain Demo" width="800" /> </a> <p><em>Click to watch the full demo video</em></p> </div>

Quickstart

git clone https://github.com/shivama205/DocBrain.git && cd DocBrain
cp .env.example .env   # add your LLM API key (Gemini, OpenAI, or Anthropic)
docker compose up -d
# Open http://localhost:3000

That's it. The stack includes the API server, Celery worker, MySQL, Redis, ChromaDB (local vector store), and the React frontend. The only thing you need to provide is an LLM API key.

Why DocBrain?

Actually self-hosted -- vectors stay local (ChromaDB), no data leaves your infra
Table-Augmented Generation -- ask natural language questions over CSVs, get SQL-generated answers
No framework lock-in -- zero LangChain/LlamaIndex dependencies
Built-in RBAC -- three role levels + knowledge base sharing with granular permissions
Multi-LLM -- swap between Gemini, OpenAI, Anthropic via config

Key Features

| Feature | Description | |---------|-------------| | Document RAG | Upload PDFs, DOCX, TXT, Markdown, HTML -- ask questions, get cited answers | | Table-Augmented Generation | Upload CSVs/Excel -- ask questions in plain English, get SQL-generated results | | Local Vector Store | ChromaDB runs locally by default. Pinecone available for production scale | | Multi-LLM Support | Google Gemini, OpenAI, Anthropic -- switch with one env var | | Query Router | Automatically routes queries to RAG or TAG based on content type | | RBAC | Admin, Owner, User roles with granular permissions | | Knowledge Base Sharing | Share knowledge bases with specific users | | FAQ System | Add curated Q&A pairs that take priority over document search |

Manual Setup (without Docker)

If you prefer running without Docker:

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env: add LLM API key, set database credentials

# Terminal 1: Start API server
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2: Start Celery worker
celery -A app.worker.celery worker --loglevel=info

You'll also need MySQL and Redis running. See .env.example for all configuration options.

The frontend is in a separate repo: DocBrain-UI.

Vector Store Options

| Store | Use Case | Config | |-------|----------|--------| | ChromaDB (default) | Development, self-hosted production. Zero config, runs locally | VECTOR_STORE_TYPE=chroma | | Pinecone | Large-scale production with managed infrastructure | VECTOR_STORE_TYPE=pinecone + API key |

Advanced Features

Query Router

The Query Router intelligently analyzes incoming queries and routes them to the most appropriate service for processing:

Advanced Question Processing: Supports complex, multi-part questions with automatic decomposition and contextual awareness
Intelligent Classification: Categorizes queries as factual, analytical, or hypothetical for optimized processing
Query Type Detection: Automatically identifies queries about statistical information, database content, or tabular data
Confidence Scoring: Provides confidence scores with each routing decision for transparent retrieval logic
Specialized Use Cases: Optimized handling for FAQs, feature requirements, technical documentation, and other enterprise needs
Follow-up Handling: Maintains context for conversational queries and follow-up questions
Clarification Mechanisms: Identifies ambiguous queries and requests clarification when needed
Fallback Strategy: Defaults to the most appropriate retrieval method for unclear queries, ensuring consistent responses

Table Augmented Generation (TAG)

TAG enhances DocBrain's ability to reason over tabular data:

SQL Generation: Automatically converts natural language queries into SQL
Table Schema Analysis: Maintains and analyzes the structure of ingested tables
CSV Support: Specialized ingestors for tabular data formats
Query Execution: Runs generated SQL against stored tables and formats results
Explanation Generation: Provides natural language explanations of results alongside the data

Role-Based Access Control (RBAC)

DocBrain implements a comprehensive permission system to secure your data:

User Roles: Three predefined roles with escalating privileges:
- User: Basic access to shared knowledge bases and conversation capabilities
- Owner: Can create, manage, and share their own knowledge bases
- Admin: Full system access with user management capabilities
Permission System: Granular permissions control access to specific features:
- Knowledge base management (view, create, update, delete)
- Document operations (view, upload, delete)
- User management (view, create, update, delete)
- System administration
Knowledge Base Sharing: Secure sharing mechanism allows:
- Owners to share knowledge bases with specific users
- Users to access only knowledge bases explicitly shared with them
- Dedicated API endpoint for users to view knowledge bases shared with them

Release Notes

For detailed information about the latest features and improvements, please see our RELEASE_NOTES.md file.

Roadmap

DocBrain has an ambitious development roadmap focused on enhancing capabilities while maintaining our commitment to privacy and security.

We've recently completed implementation of:

✅ Table Augmented Generation (TAG) - SQL-based tabular data querying
✅ Enhanced Query Router - Intelligent routing with advanced question handling for FAQs and specialized use cases
✅ Multiple LLM Provider Support - Factory pattern for integrating various LLM providers
✅ Role-Based Access Control - Comprehensive permission system
✅ Knowledge Base Sharing - Secure sharing between users

Some planned future features include:

✅ Multiple Vector Database Support - ChromaDB (local) + Pinecone (cloud)
RAG Evaluation Framework - Integration with RAGAS for measuring performance
Document-Level Permissions - Fine-grained access control at the document level
API Rate Limiting - Prevent abuse and ensure fair resource allocation
Advanced Analytics - Usage statistics and performance metrics

For the full development roadmap with implementation details and planned features, see our ROADMAP.md file.

Contributing

We Need Your Help! DocBrain is built by developers who understand enterprise security needs.

If you:

Work with sensitive internal documents
Need a RAG solution free from third-party dependencies
Have experience building secure enterprise applications
Want to shape the future of private, self-hosted AI

...then we'd love your contributions!

Here's how to get involved:

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Areas where contributions are especially welcome:

Document processors for specialized formats
Vector database connectors
Security and compliance enhancements
Performance optimizations
Documentation and examples

License

This project is licensed under the MIT License - see the LICENSE file for details.

Related Skills

claude-opus-4-5-migration

83.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

338.7k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

feishu-drive

338.7k

things-mac

338.7k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)