NexusRAG
Hybrid RAG system combining vector search, knowledge graph (LightRAG), and cross-encoder reranking — with Docling document parsing, visual intelligence (image/table captioning), agentic streaming chat, and inline citations. Powered by Gemini or local Ollama models.
Install / Use
/learn @LeDat98/NexusRAGREADME
NexusRAG
Hybrid Knowledge Base with Agentic Chat, Citations & Knowledge Graph
Upload documents. Ask questions. Get cited answers.
NexusRAG combines vector search, knowledge graph, and cross-encoder reranking into one seamless RAG pipeline — powered by Gemini, local Ollama, or fully offline sentence-transformers.
Features · Quick Start · Model Recommendations · Tech Stack
</div>Architecture
<div align="center">
Showcase
<div align="center">
Beyond Traditional RAG
Most RAG systems follow a simple pipeline: split text → embed → retrieve → generate. NexusRAG goes further at every stage:
| Aspect | Traditional RAG | NexusRAG | |---|---|---| | Document Parsing | Plain text extraction, structure lost | Docling or Marker: preserves headings, page boundaries, formulas, layout — switchable via config | | Images & Tables | Ignored entirely | Extracted, captioned by vision LLM, embedded as searchable vectors | | Chunking | Fixed-size splits, breaks mid-sentence | Hybrid semantic + structural chunking (respects headings, tables) | | Embeddings | Single model for everything | Dual-model: BAAI/bge-m3 (1024d, search) + KG embedding (Gemini 3072d / Ollama / sentence-transformers) | | Retrieval | Vector similarity only | 3-way parallel: Vector over-fetch + KG entity lookup + Cross-encoder rerank | | Knowledge | No entity awareness | LightRAG graph: entity extraction, relationship mapping, multi-hop traversal | | Context | Raw chunks dumped to LLM | Structured assembly: KG insights → cited chunks → related images/tables | | Citations | None or manual | Auto-generated 4-char IDs with page number and heading path | | Page awareness | Lost after chunking | Preserved end-to-end: chunk → citation → document viewer navigation |
Features
<details> <summary><b>Deep Document Parsing (Docling / Marker)</b></summary>NexusRAG supports two document parsers, switchable via NEXUSRAG_DOCUMENT_PARSER env config:
| Feature | Docling (default) | Marker | |---|---|---| | Math/Formula | Basic (known LaTeX issues) | Superior LaTeX via Surya | | GPU footprint | ~18-20GB VRAM (formula enrichment) | ~2-4GB VRAM | | Formats | PDF, DOCX, PPTX, HTML | PDF, DOCX, PPTX, XLSX, HTML, EPUB | | Chunking | HybridChunker (semantic + structural) | Heading-aware + page-based | | Image extraction | Via Docling pipeline | Via Marker pipeline | | Table extraction | Structured export | Markdown tables |
Both parsers share the same output contract (ParsedDocument) — downstream pipeline (dedup, embedding, KG, retrieval) works identically regardless of parser choice.
Common features across both parsers:
- Structural preservation — Heading hierarchy, page boundaries, paragraph grouping
- Multi-format — PDF, DOCX, PPTX, TXT with consistent output
- Page-aware metadata — Every chunk carries its page number, heading path, and references to images/tables on the same page
- LLM captioning — Images and tables captioned by vision/text LLM for semantic search
# Switch parser in .env
NEXUSRAG_DOCUMENT_PARSER=marker # or "docling" (default)
</details>
<details open>
<summary><b>Hybrid Retrieval Pipeline</b></summary>
| Stage | Technology | Details | |---|---|---| | Vector Embedding | BAAI/bge-m3 | 1024-dim multilingual bi-encoder (100+ languages) | | KG Embedding | Gemini / Ollama / sentence-transformers | Configurable: Gemini (3072d), Ollama, or local sentence-transformers (e.g. bge-m3 1024d) | | Vector Search | ChromaDB | Cosine similarity, over-fetch top-20 candidates | | Knowledge Graph | LightRAG | Entity/relationship extraction, keyword-to-entity matching | | Reranking | BAAI/bge-reranker-v2-m3 | Cross-encoder joint scoring — encodes (query, chunk) pairs together | | Generation | Gemini / Ollama | Agentic streaming chat with function calling |
Why two embedding models? Vector search needs speed (local bge-m3, 1024-dim). Knowledge graph extraction needs semantic richness for entity recognition — choose Gemini Embedding (3072-dim, cloud), Ollama, or sentence-transformers (fully local, no API needed). Each model is optimized for its role.
Retrieval flow:
- Parallel retrieval — Vector over-fetch (top-20) + KG entity lookup run simultaneously
- Cross-encoder reranking — All 20 candidates scored jointly with the query through a transformer (far more precise than cosine similarity alone)
- Filtering — Keep top-8 above relevance threshold (0.15), with fallback to top-3 if all below
- Media discovery — Find images and tables on the same pages as retrieved chunks
Images and tables are embedded into chunk vectors — not stored separately. When the parser extracts an image on page 5, its LLM-generated caption is appended to the text chunks on that page before embedding. This means searching for "revenue chart" finds chunks that contain the chart description, without needing a separate image search index.
Image Pipeline
- Parser (Docling or Marker) extracts images from PDF/DOCX/PPTX (up to 50 per document)
- Vision LLM (Gemini Vision or Ollama multimodal) generates captions: specific numbers, labels, trends
- Captions appended to page chunks:
[Image on page 5]: Graph showing 12% revenue growth YoY - Chunk is embedded → image becomes vector-searchable through its description
- During retrieval, images on matched pages are surfaced as
[IMG-p4f2]references
Table Pipeline
- Parser exports tables as structured Markdown (preserving rows, columns, dimensions)
- Text LLM summarizes each table: purpose, key columns, notable values (max 500 chars)
- Summaries appended to page chunks:
[Table on page 5 (3x4)]: Annual sales by region - Table summaries injected back into document Markdown as blockquotes for the document viewer
Every answer is grounded in source documents with 4-character citation IDs (e.g., [a3z1]):
- Inline citations — Clickable badges embedded directly in the answer text
- Source cards — Each citation shows filename, page number, heading path, and relevance score
- Cross-navigation — Click a citation to jump to the exact section in the document viewer
- Image references — Visual content cited separately as
[IMG-p4f2]with page tracking - Strict grounding — The LLM is instructed to only cite sources that directly support claims, max 3 per sentence
Interactive force-directed graph built from extracted entities and relationships:
- Entity types — Person, Organization, Product, Location, Event, Technology, Financial Metric, Date, Regulation (configurable)
- Force simulation — Repulsion + spring forces + center gravity with real-time physics
- Pan & zoom — Mouse drag, scroll wheel (0.3x-3x), keyboard reset
- Node interaction — Click to select, hover to highlight connected edges, drag to reposition
- Entity scaling — Node radius proportional to connectivity (degree)
- Query modes — Naive, Local (multi-hop), Global (summary), Hybrid (default)
- No extra services — LightRAG uses file-based storage (NetworkX + NanoVectorDB), zero Docker overhead
Switch between cloud and local models with a single environment variable:
Gemini (Cloud)
| Model | Best For | Thinking |
|---|---|---|
| gemini-2.5-flash | General chat, fast responses | Budget-based (auto) |
| gemini-3.1-flash-lite | High throughput, cost-effective Recommended default| Level-based: minimal / low / medium / high |
Extended thinking is automatically configured — Gemini 2.5 uses thinking_budget_tokens, Gemini 3.x uses thinking_level.
Ollama (Local / Self-hosted)
| Model | Parameters | Recommendation |
|---|---|---|
| qwen3.5:9b | 9B | Good multilingual support, solid tool calling Recommended default |
| qwen3.5:4b | 4B | Lightweight, works on 8GB RAM. May miss some tool calls |
| gemma3:12b | 12B | Best balance of quality and speed. |
Tip: For Knowledge Graph extraction, larger models (12B+) produce significantly better entity/relationship quality. Smaller models (4B) may extract zero entities on complex documents.
Provider switching — Comment/uncomment blocks in .env:
# Cloud (Gemini)
LLM_PROVIDER=gemini
GOOGLE_AI_API_KEY=your-key
# Local (Ollama) — uncomment to switch
# LLM_PROVIDER=ollama
# OLLAMA_MODEL=gemma3:12b
KG Embedding Providers
The Knowledge Graph embedding model is configured separately from the chat LLM:
| Provider | Config | API Required | Dimension | |---|---|---|---| | **Gem
