SmartRAG
SmartRAG is a privacy-first multimodal RAG system that lets you chat intelligently with your documents, images, and audio. Upload PDFs, Word files, or recordings and get accurate, context-aware answers all processed locally on your device with no external APIs.
Install / Use
/learn @itanishqshelar/SmartRAGREADME
SmartRAG - Intelligent Multimodal RAG System
A production-ready RAG system enabling intelligent conversations with documents, images, and audio files. Built with local-first AI models for complete privacy and offline operation.
Quick Start
# Standard deployment
docker-compose up -d
# Access at http://localhost:8501
Core Features
Multimodal Processing
- Documents: PDF, DOCX, TXT, MD with intelligent chunking
- Images: OCR + visual understanding via BLIP
- Audio: Automatic transcription with Whisper
Local AI Stack
- Ollama (Llama 3.1 8B) for generation
- Nomic Embed Text (768-dim) for embeddings
- ChromaDB for vector storage
- Complete offline operation
Production Ready
- Docker deployment with multi-stage builds
- Non-root user execution
- Health checks and auto-healing
- Resource management and monitoring
- Security hardening included
Technology Stack
| Component | Technology | | -------------- | --------------------------- | | LLM | Llama 3.1 8B via Ollama | | Embeddings | Nomic Embed Text (768-dim) | | Vector DB | ChromaDB / FAISS | | Vision | BLIP + CLIP + Tesseract OCR | | Audio | OpenAI Whisper (base) | | UI | Streamlit | | Storage | SQLite3 |
Architecture
<img width="1600" height="676" alt="image" src="https://github.com/user-attachments/assets/b4e96e9f-d797-409e-be6b-ceb295e91615" />Installation
Docker (Recommended)
git clone https://github.com/itanishqshelar/SmartRAG.git
cd SmartRAG/docker
# Development
docker-compose up -d
# Production with full stack (PostgreSQL, Redis, Nginx)
docker-compose -f docker-compose.prod.yml up -d
Local Setup
# Install dependencies
pip install -r requirements.txt
# Install Ollama and models
ollama pull llama3.1:8b
ollama pull nomic-embed-text
# Install system dependencies
# macOS: brew install tesseract ffmpeg
# Ubuntu: apt-get install tesseract-ocr ffmpeg
# Windows: Download from GitHub releases
# Run application
streamlit run chatbot_app.py
Configuration
SmartRAG uses a single config.yaml with Pydantic validation:
models:
llm_model: "llama3.1:8b"
embedding_model: "nomic-embed-text"
vision_model: "Salesforce/blip-image-captioning-base"
whisper_model: "base"
vector_store:
type: "chromadb"
embedding_dimension: 768
processing:
chunk_size: 1000
chunk_overlap: 200
ocr_enabled: true
generation:
temperature: 0.7
max_tokens: 2000
context_window: 4096
Override via environment variables:
export SMARTRAG_LLM_MODEL=llama2:7b
export SMARTRAG_TEMPERATURE=0.5
Usage
Web Interface
- Upload files via drag-and-drop
- Ask questions about your content
- View source documents inline
- Manage chat history and files
Python API
from multimodal_rag.system import MultimodalRAGSystem
system = MultimodalRAGSystem()
# Ingest content
system.ingest_file("document.pdf")
system.ingest_file("screenshot.png")
system.ingest_file("recording.mp3")
# Query with context
response = system.query("Summarize the key points")
print(response.answer)
Batch Processing
# Process directories
results = system.ingest_directory("./docs/", recursive=True)
print(f"Processed {len(results)} files")
Project Structure
smartrag/
├── chatbot_app.py # Streamlit application
├── config.yaml # Configuration
├── requirements.txt # Dependencies
├── multimodal_rag/
│ ├── system.py # RAG orchestrator
│ ├── processors/ # File type handlers
│ │ ├── document_processor.py
│ │ ├── image_processor.py
│ │ └── audio_processor.py
│ └── vector_stores/ # DB implementations
│ ├── chroma_store.py
│ └── faiss_store.py
├── docker/ # Production deployment
│ ├── Dockerfile
│ ├── docker-compose.yml
│ └── docker-compose.prod.yml
└── tests/ # Test suite
Deployment Options
Standard - All-in-one container with Ollama
docker-compose up -d
Lightweight - External Ollama on host
docker-compose -f docker-compose.lite.yml up -d
Production - Full stack with PostgreSQL, Redis, Nginx
docker-compose -f docker-compose.prod.yml up -d
Development
# Run tests
pytest tests/
# Code formatting
black multimodal_rag/ tests/
# Linting
flake8 multimodal_rag/ tests/
Performance
- Image size: 4.2GB
- Memory: 4-8GB recommended
- CPU: 2-4 cores recommended
- Startup time: ~90s (includes model downloads)
- Query latency: <3s typical
Security
- Local inference - no external API calls
- Non-root container execution
- File size limits enforced (50MB default)
- No privilege escalation
- Security headers in production setup
License
MIT License - see LICENSE file for details.
Acknowledgments
Built with ChromaDB, Ollama, Hugging Face Transformers, OpenAI Whisper, and Tesseract OCR.
SmartRAG - Local-first multimodal AI for document intelligence.
Related Skills
node-connect
342.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.7kCommit, push, and open a PR
