NexusRAG

Hybrid Knowledge Base with Agentic Chat, Citations & Knowledge Graph

Upload documents. Ask questions. Get cited answers.

NexusRAG combines vector search, knowledge graph, and cross-encoder reranking into one seamless RAG pipeline — powered by Gemini, local Ollama, or fully offline sentence-transformers.

Features · Quick Start · Model Recommendations · Tech Stack

</div>

Architecture

NexusRAG Architecture

</div>

Showcase

NexusRAG Demo

</div>

Beyond Traditional RAG

Most RAG systems follow a simple pipeline: split text → embed → retrieve → generate. NexusRAG goes further at every stage:

| Aspect | Traditional RAG | NexusRAG | |---|---|---| | Document Parsing | Plain text extraction, structure lost | Docling or Marker: preserves headings, page boundaries, formulas, layout — switchable via config | | Images & Tables | Ignored entirely | Extracted, captioned by vision LLM, embedded as searchable vectors | | Chunking | Fixed-size splits, breaks mid-sentence | Hybrid semantic + structural chunking (respects headings, tables) | | Embeddings | Single model for everything | Dual-model: BAAI/bge-m3 (1024d, search) + KG embedding (Gemini 3072d / Ollama / sentence-transformers) | | Retrieval | Vector similarity only | 3-way parallel: Vector over-fetch + KG entity lookup + Cross-encoder rerank | | Knowledge | No entity awareness | LightRAG graph: entity extraction, relationship mapping, multi-hop traversal | | Context | Raw chunks dumped to LLM | Structured assembly: KG insights → cited chunks → related images/tables | | Citations | None or manual | Auto-generated 4-char IDs with page number and heading path | | Page awareness | Lost after chunking | Preserved end-to-end: chunk → citation → document viewer navigation |

Features

<details> <summary><b>Deep Document Parsing (Docling / Marker)</b></summary>

NexusRAG supports two document parsers, switchable via NEXUSRAG_DOCUMENT_PARSER env config:

| Feature | Docling (default) | Marker | |---|---|---| | Math/Formula | Basic (known LaTeX issues) | Superior LaTeX via Surya | | GPU footprint | ~18-20GB VRAM (formula enrichment) | ~2-4GB VRAM | | Formats | PDF, DOCX, PPTX, HTML | PDF, DOCX, PPTX, XLSX, HTML, EPUB | | Chunking | HybridChunker (semantic + structural) | Heading-aware + page-based | | Image extraction | Via Docling pipeline | Via Marker pipeline | | Table extraction | Structured export | Markdown tables |

Both parsers share the same output contract (ParsedDocument) — downstream pipeline (dedup, embedding, KG, retrieval) works identically regardless of parser choice.

Common features across both parsers:

Structural preservation — Heading hierarchy, page boundaries, paragraph grouping
Multi-format — PDF, DOCX, PPTX, TXT with consistent output
Page-aware metadata — Every chunk carries its page number, heading path, and references to images/tables on the same page
LLM captioning — Images and tables captioned by vision/text LLM for semantic search

# Switch parser in .env
NEXUSRAG_DOCUMENT_PARSER=marker   # or "docling" (default)

</details> <details open> <summary><b>Hybrid Retrieval Pipeline</b></summary>

| Stage | Technology | Details | |---|---|---| | Vector Embedding | BAAI/bge-m3 | 1024-dim multilingual bi-encoder (100+ languages) | | KG Embedding | Gemini / Ollama / sentence-transformers | Configurable: Gemini (3072d), Ollama, or local sentence-transformers (e.g. bge-m3 1024d) | | Vector Search | ChromaDB | Cosine similarity, over-fetch top-20 candidates | | Knowledge Graph | LightRAG | Entity/relationship extraction, keyword-to-entity matching | | Reranking | BAAI/bge-reranker-v2-m3 | Cross-encoder joint scoring — encodes (query, chunk) pairs together | | Generation | Gemini / Ollama | Agentic streaming chat with function calling |

Why two embedding models? Vector search needs speed (local bge-m3, 1024-dim). Knowledge graph extraction needs semantic richness for entity recognition — choose Gemini Embedding (3072-dim, cloud), Ollama, or sentence-transformers (fully local, no API needed). Each model is optimized for its role.

Retrieval flow:

Parallel retrieval — Vector over-fetch (top-20) + KG entity lookup run simultaneously
Cross-encoder reranking — All 20 candidates scored jointly with the query through a transformer (far more precise than cosine similarity alone)
Filtering — Keep top-8 above relevance threshold (0.15), with fallback to top-3 if all below
Media discovery — Find images and tables on the same pages as retrieved chunks

</details> <details> <summary><b>Visual Document Intelligence</b></summary>

Images and tables are embedded into chunk vectors — not stored separately. When the parser extracts an image on page 5, its LLM-generated caption is appended to the text chunks on that page before embedding. This means searching for "revenue chart" finds chunks that contain the chart description, without needing a separate image search index.

Image Pipeline

Parser (Docling or Marker) extracts images from PDF/DOCX/PPTX (up to 50 per document)
Vision LLM (Gemini Vision or Ollama multimodal) generates captions: specific numbers, labels, trends
Captions appended to page chunks: [Image on page 5]: Graph showing 12% revenue growth YoY
Chunk is embedded → image becomes vector-searchable through its description
During retrieval, images on matched pages are surfaced as [IMG-p4f2] references

Table Pipeline

Parser exports tables as structured Markdown (preserving rows, columns, dimensions)
Text LLM summarizes each table: purpose, key columns, notable values (max 500 chars)
Summaries appended to page chunks: [Table on page 5 (3x4)]: Annual sales by region
Table summaries injected back into document Markdown as blockquotes for the document viewer

</details> <details> <summary><b>Citation System</b></summary>

Every answer is grounded in source documents with 4-character citation IDs (e.g., [a3z1]):

Inline citations — Clickable badges embedded directly in the answer text
Source cards — Each citation shows filename, page number, heading path, and relevance score
Cross-navigation — Click a citation to jump to the exact section in the document viewer
Image references — Visual content cited separately as [IMG-p4f2] with page tracking
Strict grounding — The LLM is instructed to only cite sources that directly support claims, max 3 per sentence

</details> <details> <summary><b>Knowledge Graph Visualization</b></summary>

Interactive force-directed graph built from extracted entities and relationships:

Entity types — Person, Organization, Product, Location, Event, Technology, Financial Metric, Date, Regulation (configurable)
Force simulation — Repulsion + spring forces + center gravity with real-time physics
Pan & zoom — Mouse drag, scroll wheel (0.3x-3x), keyboard reset
Node interaction — Click to select, hover to highlight connected edges, drag to reposition
Entity scaling — Node radius proportional to connectivity (degree)
Query modes — Naive, Local (multi-hop), Global (summary), Hybrid (default)
No extra services — LightRAG uses file-based storage (NetworkX + NanoVectorDB), zero Docker overhead

</details> <details> <summary><b>Multi-Provider LLM</b></summary>

Switch between cloud and local models with a single environment variable:

Gemini (Cloud)

| Model | Best For | Thinking | |---|---|---| | gemini-2.5-flash | General chat, fast responses | Budget-based (auto) | | gemini-3.1-flash-lite | High throughput, cost-effective Recommended default| Level-based: minimal / low / medium / high |

Extended thinking is automatically configured — Gemini 2.5 uses thinking_budget_tokens, Gemini 3.x uses thinking_level.

Ollama (Local / Self-hosted)

| Model | Parameters | Recommendation | |---|---|---| | qwen3.5:9b | 9B | Good multilingual support, solid tool calling Recommended default | | qwen3.5:4b | 4B | Lightweight, works on 8GB RAM. May miss some tool calls | | gemma3:12b | 12B | Best balance of quality and speed. |

Tip: For Knowledge Graph extraction, larger models (12B+) produce significantly better entity/relationship quality. Smaller models (4B) may extract zero entities on complex documents.

Provider switching — Comment/uncomment blocks in .env:

# Cloud (Gemini)
LLM_PROVIDER=gemini
GOOGLE_AI_API_KEY=your-key

# Local (Ollama) — uncomment to switch
# LLM_PROVIDER=ollama
# OLLAMA_MODEL=gemma3:12b

KG Embedding Providers

The Knowledge Graph embedding model is configured separately from the chat LLM:

| Provider | Config | API Required | Dimension | |---|---|---|---| | **Gem

NexusRAG

Install / Use

README

NexusRAG

Hybrid Knowledge Base with Agentic Chat, Citations & Knowledge Graph

Architecture

Showcase

Beyond Traditional RAG

Features

Gemini (Cloud)

Ollama (Local / Self-hosted)

KG Embedding Providers