NanoSage 🧙: Advanced Recursive Search & Report Generation

Deep Research assistant that runs on your laptop, using tiny models. - all open source!

How is NanoSage different than other Assistant Researchers?

It offers a structured breakdown of a multi-source, relevance-driven, recursive search pipeline. It walks through how the system refines a user query, builds a knowledge base from local and web data, and dynamically explores subqueries—tracking progress through a Table of Contents (TOC).

With Monte Carlo-based exploration, the system balances depth vs. breadth, ranking each branch’s relevance to ensure precision and avoid unrelated tangents. The result? A detailed, well-organized report generated using retrieval-augmented generation (RAG), integrating the most valuable insights.

I wanted to experiment with new research methods, so I thought, basically, when we research a topic, we randomly explore new ideas as we search, and NanoSage basically does that! It explores and records its journey, where each (relevant) step is a node... and then sums it up to you in a neat report! Where the table of content is basically its search graph. 🧙

Example Report

You can find an example report in the following link:
example report output for query: "Create a structure bouldering gym workout to push my climbing from v4 to v6"

Quick Start Guide

1. Install Dependencies

Ensure Python 3.8+ is installed.
Install required packages:

pip install -r requirements.txt

(Optional) For GPU acceleration, install PyTorch with CUDA:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

(Replace cu118 with your CUDA version.)

Make sure to update pyOpenSSL and cryptography:

pip install --upgrade pyOpenSSL cryptography

2. Set Up Ollama & Pull the Gemma Model

Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh
pip install --upgrade ollama

(Windows users: see ollama.com for installer.)

Pull Gemma 2B (for RAG-based summaries):

ollama pull gemma2:2b

3. Run a Simple Search Query

A sample command to run your search session:

python main.py --query "Create a structure bouldering gym workout to push my climbing from v4 to v6"  \
               --web_search \
               --max_depth 2 \
               --device cpu \
               --top_k 10 \
               --retrieval_model colpali

Parameters:

--query: Main search query (natural language).
--web_search: Enables web-based retrieval.
--max_depth: Recursion depth for subqueries (2 levels).
--device cpu: Uses CPU (swap with cuda for GPU).
--retrieval_model colpali: Uses ColPali for retrieval (try all-minilm for lighter model).

4. Check Results & Report

A detailed Markdown report will appear in results/<query_id>/.

Example:

results/
└── 389380e2/
    ├── Quantum_computing_in_healthcare_output.md
    ├── web_Quantum_computing/
    ├── web_results/
    └── local_results/

Open the *_output.md file (e.g., Quantum_computing_in_healthcare_output.md) in a Markdown viewer (VSCode, Obsidian, etc.).

5. Advanced Options

✅ Using Local Files

If you have local PDFs, text files, or images:

python main.py --query "AI in finance" \
               --corpus_dir "my_local_data/" \
               --top_k 5 \
               --device cpu

Now the system searches both local docs and web data (if --web_search is enabled).

🔄 RAG with Gemma 2B

python main.py --query "Climate change impact on economy" \
               --rag_model gemma \
               --personality "scientific"

This uses Gemma 2B to generate LLM-based summaries and the final report.

6. Troubleshooting

Missing dependencies? Rerun: pip install -r requirements.txt
Ollama not found? Ensure it’s installed (ollama list shows gemma:2b).
Memory issues? Use --device cpu.
Too many subqueries? Lower --max_depth to 1.

7. Next Steps

Try different retrieval models (--retrieval_model all-minilm).
Tweak recursion (--max_depth).
Tune config.yaml for web search limits, min_relevance, or Monte Carlo search.

Detailed Design: NanoSage Architecture

1. Core Input Parameters

User Query: E.g. "Quantum computing in healthcare".

CLI Flags (in main.py):

--corpus_dir
--device
--retrieval_model
--top_k
--web_search
--personality
--rag_model
--max_depth

YAML Config (e.g. config.yaml):
- "results_base_dir", "max_query_length", "web_search_limit", "min_relevance", etc.

2. Configuration & Session Setup

Configuration:
load_config(config_path) to read YAML settings.
- min_relevance: cutoff for subquery branching.
Session Initialization:
SearchSession.__init__() sets:
- A unique query_id & base_result_dir.
- Enhanced query via chain_of_thought_query_enhancement().
- Retrieval model loaded with load_retrieval_model().
- Query embedding for relevance checks (embed_text()).
- Local files (if any) loaded & added to KnowledgeBase.

3. Recursive Web Search & TOC Tracking

Subquery Generation:
- The enhanced query is split with split_query().
Relevance Filtering:
- For each subquery, compare embeddings with the main query (via late_interaction_score()).
- If < min_relevance, skip to avoid rabbit holes.
TOCNode Creation:
- Each subquery → TOCNode, storing the text, summary, relevance, etc.
Web Data:
- If relevant:
  - download_webpages_ddg() to fetch results.
  - parse_html_to_text() and embed them.
  - Summarize snippets (summarize_text()).
- If current_depth < max_depth, optionally expand new sub-subqueries (chain-of-thought on the current subquery).
Hierarchy:
- All subqueries & expansions form a tree of TOC nodes for the final report.

4. Local Retrieval & Summaries

Local Documents + Downloaded Web Entries → appended into KnowledgeBase.
KnowledgeBase.search(...) for top-K relevant docs.
Summaries:
- Summarize web results & local retrieval with summarize_text().

5. Final RAG Prompt & Report Generation

_build_final_answer(...):
- Constructs a large prompt including:
  - The user query,
  - Table of Contents (with node summaries),
  - Summaries of web & local results,
  - Reference URLs.
- Asks for a “multi-section advanced markdown report.”
rag_final_answer(...):
- Calls call_gemma() (or other LLM) to produce the final text.
aggregate_results(...):
- Saves the final answer plus search data into a .md file in results/<query_id>/.

6. Balancing Exploration vs. Exploitation

Subqueries with relevance_score < min_relevance are skipped.
Depth-limited recursion ensures not to blow up on too many expansions.
Monte Carlo expansions (optional) can sample random subqueries to avoid missing unexpected gems.

7. Final Output

Markdown report summarizing relevant subqueries, local docs, and a final advanced RAG-based discussion.

Summary Flow Diagram

User Query
    │
    ▼
main.py:
    └── load_config(config.yaml)
         └── Create SearchSession(...)
              │
              ├── chain_of_thought_query_enhancement()
              ├── load_retrieval_model()
              ├── embed_text() for reference
              ├── load_corpus_from_dir() → KnowledgeBase.add_documents()
              └── run_session():
                  └── perform_recursive_web_searches():
                      ├── For each subquery:
                      │   ├─ Compute relevance_score
                      │   ├─ if relevance_score < min_relevance: skip
                      │   ├─ else:
                      │   │   ├─ download_webpages_ddg()
                      │   │   ├─ parse_html_to_text(), embed
                      │   │   ├─ summarize_text() → store in TOCNode
                      │   │   └─ if depth < max_depth:
                      │   │       └─ recursively expand
                      └── Aggregates web corpus, builds TOC
              │
              ├── KnowledgeBase.search(enhanced_query, top_k)
              ├── Summarize results
              ├── _build_final_answer() → prompt
              ├── rag_final_answer() → call_gemma()
              └── aggregate_results() → saves Markdown

If you found NanoSage useful for your research or project - or saved you 1 minute of googling, please consider citing it:

BibTeX Citation:

@misc{NanoSage,
  author = {Foad Abo Dahood}, 
  title = {NanoSage: A Recursive, Relevance-Driven Search and RAG Pipeline},
  year = {2025},
  howpublished = {\url{https://github.com/masterFoad/NanoSage}},
  note = {Accessed: \today}
}

APA Citation:
Foad, Abo Dahood. (2025). NanoSage: A Recursive, Relevance-Driven Search and RAG Pipeline. Retrieved from https://github.com/masterFoad/NanoSage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NanoSage 🧙: Advanced Recursive Search & Report Generation

How is NanoSage different than other Assistant Researchers?

Example Report

Quick Start Guide

1. Install Dependencies

2. Set Up Ollama & Pull the Gemma Model

3. Run a Simple Search Query

4. Check Results & Report

5. Advanced Options

✅ Using Local Files

🔄 RAG with Gemma 2B

6. Troubleshooting

7. Next Steps

Detailed Design: NanoSage Architecture

1. Core Input Parameters

2. Configuration & Session Setup

3. Recursive Web Search & TOC Tracking

4. Local Retrieval & Summaries

5. Final RAG Prompt & Report Generation

6. Balancing Exploration vs. Exploitation

7. Final Output

Summary Flow Diagram

Files

README.md

Latest commit

History

README.md

File metadata and controls

NanoSage 🧙: Advanced Recursive Search & Report Generation

How is NanoSage different than other Assistant Researchers?

Example Report

Quick Start Guide

1. Install Dependencies

2. Set Up Ollama & Pull the Gemma Model

3. Run a Simple Search Query

4. Check Results & Report

5. Advanced Options

✅ Using Local Files

🔄 RAG with Gemma 2B

6. Troubleshooting

7. Next Steps

Detailed Design: NanoSage Architecture

1. Core Input Parameters

2. Configuration & Session Setup

3. Recursive Web Search & TOC Tracking

4. Local Retrieval & Summaries

5. Final RAG Prompt & Report Generation

6. Balancing Exploration vs. Exploitation

7. Final Output

Summary Flow Diagram