SkillAgentSearch skills...

NexusRAG

Hybrid RAG system combining vector search, knowledge graph (LightRAG), and cross-encoder reranking — with Docling document parsing, visual intelligence (image/table captioning), agentic streaming chat, and inline citations. Powered by Gemini or local Ollama models.

Install / Use

/learn @LeDat98/NexusRAG

README

<div align="center">

NexusRAG

Hybrid Knowledge Base with Agentic Chat, Citations & Knowledge Graph

Python React FastAPI Docker License: MIT LinkedIn

Upload documents. Ask questions. Get cited answers.

NexusRAG combines vector search, knowledge graph, and cross-encoder reranking into one seamless RAG pipeline — powered by Gemini, local Ollama, or fully offline sentence-transformers.

Features · Quick Start · Model Recommendations · Tech Stack

</div>

Architecture

<div align="center">

NexusRAG Architecture

</div>

Showcase

<div align="center">

NexusRAG Demo

</div>

Beyond Traditional RAG

Most RAG systems follow a simple pipeline: split text → embed → retrieve → generate. NexusRAG goes further at every stage:

| Aspect | Traditional RAG | NexusRAG | |---|---|---| | Document Parsing | Plain text extraction, structure lost | Docling or Marker: preserves headings, page boundaries, formulas, layout — switchable via config | | Images & Tables | Ignored entirely | Extracted, captioned by vision LLM, embedded as searchable vectors | | Chunking | Fixed-size splits, breaks mid-sentence | Hybrid semantic + structural chunking (respects headings, tables) | | Embeddings | Single model for everything | Dual-model: BAAI/bge-m3 (1024d, search) + KG embedding (Gemini 3072d / Ollama / sentence-transformers) | | Retrieval | Vector similarity only | 3-way parallel: Vector over-fetch + KG entity lookup + Cross-encoder rerank | | Knowledge | No entity awareness | LightRAG graph: entity extraction, relationship mapping, multi-hop traversal | | Context | Raw chunks dumped to LLM | Structured assembly: KG insights → cited chunks → related images/tables | | Citations | None or manual | Auto-generated 4-char IDs with page number and heading path | | Page awareness | Lost after chunking | Preserved end-to-end: chunk → citation → document viewer navigation |


Features

<details> <summary><b>Deep Document Parsing (Docling / Marker)</b></summary>

NexusRAG supports two document parsers, switchable via NEXUSRAG_DOCUMENT_PARSER env config:

| Feature | Docling (default) | Marker | |---|---|---| | Math/Formula | Basic (known LaTeX issues) | Superior LaTeX via Surya | | GPU footprint | ~18-20GB VRAM (formula enrichment) | ~2-4GB VRAM | | Formats | PDF, DOCX, PPTX, HTML | PDF, DOCX, PPTX, XLSX, HTML, EPUB | | Chunking | HybridChunker (semantic + structural) | Heading-aware + page-based | | Image extraction | Via Docling pipeline | Via Marker pipeline | | Table extraction | Structured export | Markdown tables |

Both parsers share the same output contract (ParsedDocument) — downstream pipeline (dedup, embedding, KG, retrieval) works identically regardless of parser choice.

Common features across both parsers:

  • Structural preservation — Heading hierarchy, page boundaries, paragraph grouping
  • Multi-format — PDF, DOCX, PPTX, TXT with consistent output
  • Page-aware metadata — Every chunk carries its page number, heading path, and references to images/tables on the same page
  • LLM captioning — Images and tables captioned by vision/text LLM for semantic search
# Switch parser in .env
NEXUSRAG_DOCUMENT_PARSER=marker   # or "docling" (default)
</details> <details open> <summary><b>Hybrid Retrieval Pipeline</b></summary>

| Stage | Technology | Details | |---|---|---| | Vector Embedding | BAAI/bge-m3 | 1024-dim multilingual bi-encoder (100+ languages) | | KG Embedding | Gemini / Ollama / sentence-transformers | Configurable: Gemini (3072d), Ollama, or local sentence-transformers (e.g. bge-m3 1024d) | | Vector Search | ChromaDB | Cosine similarity, over-fetch top-20 candidates | | Knowledge Graph | LightRAG | Entity/relationship extraction, keyword-to-entity matching | | Reranking | BAAI/bge-reranker-v2-m3 | Cross-encoder joint scoring — encodes (query, chunk) pairs together | | Generation | Gemini / Ollama | Agentic streaming chat with function calling |

Why two embedding models? Vector search needs speed (local bge-m3, 1024-dim). Knowledge graph extraction needs semantic richness for entity recognition — choose Gemini Embedding (3072-dim, cloud), Ollama, or sentence-transformers (fully local, no API needed). Each model is optimized for its role.

Retrieval flow:

  1. Parallel retrieval — Vector over-fetch (top-20) + KG entity lookup run simultaneously
  2. Cross-encoder reranking — All 20 candidates scored jointly with the query through a transformer (far more precise than cosine similarity alone)
  3. Filtering — Keep top-8 above relevance threshold (0.15), with fallback to top-3 if all below
  4. Media discovery — Find images and tables on the same pages as retrieved chunks
</details> <details> <summary><b>Visual Document Intelligence</b></summary>

Images and tables are embedded into chunk vectors — not stored separately. When the parser extracts an image on page 5, its LLM-generated caption is appended to the text chunks on that page before embedding. This means searching for "revenue chart" finds chunks that contain the chart description, without needing a separate image search index.

Image Pipeline

  1. Parser (Docling or Marker) extracts images from PDF/DOCX/PPTX (up to 50 per document)
  2. Vision LLM (Gemini Vision or Ollama multimodal) generates captions: specific numbers, labels, trends
  3. Captions appended to page chunks: [Image on page 5]: Graph showing 12% revenue growth YoY
  4. Chunk is embedded → image becomes vector-searchable through its description
  5. During retrieval, images on matched pages are surfaced as [IMG-p4f2] references

Table Pipeline

  1. Parser exports tables as structured Markdown (preserving rows, columns, dimensions)
  2. Text LLM summarizes each table: purpose, key columns, notable values (max 500 chars)
  3. Summaries appended to page chunks: [Table on page 5 (3x4)]: Annual sales by region
  4. Table summaries injected back into document Markdown as blockquotes for the document viewer
</details> <details> <summary><b>Citation System</b></summary>

Every answer is grounded in source documents with 4-character citation IDs (e.g., [a3z1]):

  • Inline citations — Clickable badges embedded directly in the answer text
  • Source cards — Each citation shows filename, page number, heading path, and relevance score
  • Cross-navigation — Click a citation to jump to the exact section in the document viewer
  • Image references — Visual content cited separately as [IMG-p4f2] with page tracking
  • Strict grounding — The LLM is instructed to only cite sources that directly support claims, max 3 per sentence
</details> <details> <summary><b>Knowledge Graph Visualization</b></summary>

Interactive force-directed graph built from extracted entities and relationships:

  • Entity types — Person, Organization, Product, Location, Event, Technology, Financial Metric, Date, Regulation (configurable)
  • Force simulation — Repulsion + spring forces + center gravity with real-time physics
  • Pan & zoom — Mouse drag, scroll wheel (0.3x-3x), keyboard reset
  • Node interaction — Click to select, hover to highlight connected edges, drag to reposition
  • Entity scaling — Node radius proportional to connectivity (degree)
  • Query modes — Naive, Local (multi-hop), Global (summary), Hybrid (default)
  • No extra services — LightRAG uses file-based storage (NetworkX + NanoVectorDB), zero Docker overhead
</details> <details> <summary><b>Multi-Provider LLM</b></summary>

Switch between cloud and local models with a single environment variable:

Gemini (Cloud)

| Model | Best For | Thinking | |---|---|---| | gemini-2.5-flash | General chat, fast responses | Budget-based (auto) | | gemini-3.1-flash-lite | High throughput, cost-effective Recommended default| Level-based: minimal / low / medium / high |

Extended thinking is automatically configured — Gemini 2.5 uses thinking_budget_tokens, Gemini 3.x uses thinking_level.

Ollama (Local / Self-hosted)

| Model | Parameters | Recommendation | |---|---|---| | qwen3.5:9b | 9B | Good multilingual support, solid tool calling Recommended default | | qwen3.5:4b | 4B | Lightweight, works on 8GB RAM. May miss some tool calls | | gemma3:12b | 12B | Best balance of quality and speed. |

Tip: For Knowledge Graph extraction, larger models (12B+) produce significantly better entity/relationship quality. Smaller models (4B) may extract zero entities on complex documents.

Provider switching — Comment/uncomment blocks in .env:

# Cloud (Gemini)
LLM_PROVIDER=gemini
GOOGLE_AI_API_KEY=your-key

# Local (Ollama) — uncomment to switch
# LLM_PROVIDER=ollama
# OLLAMA_MODEL=gemma3:12b

KG Embedding Providers

The Knowledge Graph embedding model is configured separately from the chat LLM:

| Provider | Config | API Required | Dimension | |---|---|---|---| | **Gem

View on GitHub
GitHub Stars216
CategoryDevelopment
Updated1d ago
Forks50

Languages

Python

Security Score

85/100

Audited on Mar 26, 2026

No findings