NanoSage
Local LLM Powered Recursive Search & Smart Knowledge Explorer
Install / Use
/learn @masterFoad/NanoSageREADME
🧙♂️ NanoSage: Advanced Recursive Search & Report Generation
Deep Research assistant that runs on your laptop, using tiny models. - all open source!
How is NanoSage different than other Assistant Researchers?
It offers a structured breakdown of a multi-source, relevance-driven, recursive search pipeline. It walks through how the system refines a user query, builds a knowledge base from local and web data, and dynamically explores subqueries—tracking progress through a Table of Contents (TOC).
With Monte Carlo-based exploration, the system balances depth vs. breadth, ranking each branch's relevance to ensure precision and avoid unrelated tangents. The result? A detailed, well-organized report generated using retrieval-augmented generation (RAG), integrating the most valuable insights.
I wanted to experiment with new research methods, so I thought, basically, when we research a topic, we randomly explore new ideas as we search, and NanoSage basically does that! It explores and records its journey, where each (relevant) step is a node... and then sums it up to you in a neat report! Where the table of content is basically its search graph. 🧙
📅 Latest Update - October 17, 2025
Major System Enhancement: Tavily Integration & Hybrid Embedding Architecture
- Tavily Search API Integration: Replaced unreliable free search engines with Tavily's robust API, enabling access to high-quality academic sources including PubMed, research journals, and scholarly databases
- Hybrid Embedding System: Implemented intelligent model selection where SigLIP/CLIP handle vision tasks (images, PDFs) while all-MiniLM processes text content, eliminating dimension mismatches and optimizing performance
- Enhanced Web Crawler: Added comprehensive web content extraction with metadata generation, domain grouping, and fallback search engines for maximum reliability and coverage
Example Report
You can find an example report in the following link:
example report output for query: "Create a structure bouldering gym workout to push my climbing from v4 to v6"
Quick Start Guide
1. Install Dependencies
- Ensure Python 3.8+ is installed.
- Install required packages:
pip install -r requirements.txt
- (Optional) For GPU acceleration, install PyTorch with CUDA:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
(Replace cu118 with your CUDA version.)
- Make sure to update pyOpenSSL and cryptography:
pip install --upgrade pyOpenSSL cryptography
2. Set Up Ollama & Pull the Gemma Model
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
pip install --upgrade ollama
(Windows users: see ollama.com for installer.)
- Pull Gemma 2B (for RAG-based summaries):
ollama pull gemma2:2b
3. Configure API Keys
# Copy the example environment file
cp env.example .env
# Edit .env and add your Tavily API key
# Get your free API key at: https://tavily.com/
TAVILY_API_KEY=your_tavily_api_key_here
4. Run Examples
# Basic web search with SigLIP (vision + text hybrid) - RECOMMENDED
python main.py --query "machine learning algorithms" --retrieval_model siglip --web_search
# Web search with local documents
python main.py --query "quantum computing" --corpus_dir ./my_documents --retrieval_model siglip --web_search
# Fast text-only search
python main.py --query "artificial intelligence" --retrieval_model all-minilm --web_search
# Local documents only (no web search)
python main.py --query "research papers" --corpus_dir ./my_documents --retrieval_model colpali
Parameters:
--query: Main search query (natural language).--web_search: Enables web-based retrieval via Tavily API.--retrieval_model: Choose fromsiglip(recommended),clip,colpali, orall-minilm.--device cpu: Uses CPU (swap withcudafor GPU).--max_depth: Recursion depth for subqueries (default: 1).
5. Available Models
siglip: Vision + text hybrid (recommended for images/PDFs + web content)clip: Vision + text hybrid (alternative to SigLIP)colpali: Advanced text model (good for documents)all-minilm: Fast text model (good for speed)
6. Supported File Types
- Text:
.txt,.md,.py,.json,.yaml,.csv - PDFs: Converted to images for vision models, OCR for text models
- Images:
.png,.jpg,.jpeg(vision models or OCR fallback)
7. Check Results & Report
A detailed Markdown report will appear in results/<query_id>/.
Example:
results/
└── 389380e2/
├── Quantum_computing_in_healthcare_output.md
├── web_Quantum_computing/
├── web_results/
└── local_results/
Open the *_output.md file (e.g., Quantum_computing_in_healthcare_output.md) in a Markdown viewer (VSCode, Obsidian, etc.).
8. Advanced Options
✅ Using Local Files
If you have local PDFs, text files, or images:
python main.py --query "AI in finance" \
--corpus_dir "my_local_data/" \
--top_k 5 \
--device cpu
Now the system searches both local docs and web data (if --web_search is enabled).
🔄 RAG with Gemma 2B
python main.py --query "Climate change impact on economy" \
--rag_model gemma \
--personality "scientific"
This uses Gemma 2B to generate LLM-based summaries and the final report.
9. Troubleshooting
- Missing dependencies? Rerun:
pip install -r requirements.txt - Ollama not found? Ensure it's installed (
ollama listshowsgemma:2b). - Memory issues? Use
--device cpu. - Too many subqueries? Lower
--max_depthto 1. - Web search not working? Check your
TAVILY_API_KEYin.envfile.
10. Next Steps
- Try different retrieval models (
--retrieval_model siglipfor best results). - Tweak recursion (
--max_depth). - Tune
config.yamlfor web search limits,min_relevance, or Monte Carlo search.
Hybrid Embedding System
NanoSage uses a hybrid embedding approach for optimal performance:
-
Vision Models (SigLIP/CLIP):
- Use SigLIP/CLIP for images and PDFs → images
- Use all-MiniLM for text content (web pages, documents)
- Ensures consistent 384D embeddings for text content
-
Text Models (ColPali/all-MiniLM):
- Use the same model for all content types
- Consistent embedding dimensions
This approach eliminates dimension mismatches and uses the right tool for each content type.
Web Search Integration
- Primary: Tavily Search API (reliable, academic sources)
- Fallback: DuckDuckGo, SearxNG, Wikipedia
- Sources: PubMed, academic journals, research databases
- Features: Real-time search, content extraction, metadata generation
Detailed Design: NanoSage Architecture
1. Core Input Parameters
- User Query: E.g.
"Quantum computing in healthcare". - CLI Flags (in
main.py):--corpus_dir --device --retrieval_model --top_k --web_search --personality --rag_model --max_depth - YAML Config (e.g.
config.yaml):"results_base_dir","max_query_length","web_search_limit","min_relevance", etc.
2. Configuration & Session Setup
-
Configuration:
load_config(config_path)to read YAML settings.min_relevance: cutoff for subquery branching.
-
Session Initialization:
SearchSession.__init__()sets:- A unique
query_id&base_result_dir. - Enhanced query via
chain_of_thought_query_enhancement(). - Retrieval model loaded with
load_retrieval_model(). - Query embedding for relevance checks (
embed_text()). - Local files (if any) loaded & added to
KnowledgeBase.
- A unique
3. Recursive Web Search & TOC Tracking
-
Subquery Generation:
- The enhanced query is split with
split_query().
- The enhanced query is split with
-
Monte Carlo Subquery Sampling (Optional):
- The system can use a Monte Carlo approach to intelligently sample the most relevant subqueries, balancing exploration depth with computational efficiency.
- Each subquery is scored for relevance against the main query using embedding similarity.
- Only the most promising subqueries are selected for further exploration.
-
Relevance Filtering:
- For each subquery, compare embeddings with the main query (via
late_interaction_score()). - If
< min_relevance, skip to avoid rabbit holes.
- For each subquery, compare embeddings with the main query (via
-
TOCNode Creation:
- Each subquery →
TOCNode, storing the text, summary, relevance, etc.
- Each subquery →
-
Web Data:
- If relevant:
search_and_download()via Tavily API to fetch results.parse_any_to_text()and embed them.- Summarize snippets (
summarize_text()).
- If
current_depth < max_depth, optionally expand new sub-subqueries (chain-of-thought on the current subquery).
- If relevant:
-
Hierarchy:
- All subqueries & expansions form a tree of TOC nodes for the final report.
4. Local Retrieval & Summaries
- Local Documents + Downloaded Web Entries → appended into
KnowledgeBase. - KnowledgeBase.search(...) for top-K relevant docs.
- Summaries:
- Summarize web results & local retrieval with
summarize_text().
- Summarize web results & local retrieval with
5. Final RAG Prompt & Report Generation
-
_build_final_answer(...):
- Constructs a large prompt including:
- The user query,
- Table of Contents (with node summaries),
- Summaries of web & local results,
- Reference URLs.
- Asks for a "multi-section advanced markdown report."
- Constructs a large prompt including:
-
rag_final_answer(...):
- Calls
call_gemma()(or other LLM) to produce the final text.
- Calls
-
aggregate_results(...):
- Saves the final
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
