Deep Research assistant that runs on your laptop, using tiny models. - all open source!
It offers a structured breakdown of a multi-source, relevance-driven, recursive search pipeline. It walks through how the system refines a user query, builds a knowledge base from local and web data, and dynamically explores subqueries—tracking progress through a Table of Contents (TOC).
With Monte Carlo-based exploration, the system balances depth vs. breadth, ranking each branch’s relevance to ensure precision and avoid unrelated tangents. The result? A detailed, well-organized report generated using retrieval-augmented generation (RAG), integrating the most valuable insights.
I wanted to experiment with new research methods, so I thought, basically, when we research a topic, we randomly explore new ideas as we search, and NanoSage basically does that! It explores and records its journey, where each (relevant) step is a node... and then sums it up to you in a neat report! Where the table of content is basically its search graph. 🧙
You can find an example report in the following link:
example report output for query: "Create a structure bouldering gym workout to push my climbing from v4 to v6"
- Ensure Python 3.8+ is installed.
- Install required packages:
pip install -r requirements.txt
- (Optional) For GPU acceleration, install PyTorch with CUDA:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
(Replace cu118
with your CUDA version.)
- Make sure to update pyOpenSSL and cryptography:
pip install --upgrade pyOpenSSL cryptography
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
pip install --upgrade ollama
(Windows users: see ollama.com for installer.)
- Pull Gemma 2B (for RAG-based summaries):
ollama pull gemma2:2b
A sample command to run your search session:
python main.py --query "Create a structure bouldering gym workout to push my climbing from v4 to v6" \
--web_search \
--max_depth 2 \
--device cpu \
--top_k 10 \
--retrieval_model colpali
Parameters:
--query
: Main search query (natural language).--web_search
: Enables web-based retrieval.--max_depth
: Recursion depth for subqueries (2 levels).--device cpu
: Uses CPU (swap withcuda
for GPU).--retrieval_model colpali
: Uses ColPali for retrieval (tryall-minilm
for lighter model).
A detailed Markdown report will appear in results/<query_id>/
.
Example:
results/
└── 389380e2/
├── Quantum_computing_in_healthcare_output.md
├── web_Quantum_computing/
├── web_results/
└── local_results/
Open the *_output.md
file (e.g., Quantum_computing_in_healthcare_output.md
) in a Markdown viewer (VSCode, Obsidian, etc.).
If you have local PDFs, text files, or images:
python main.py --query "AI in finance" \
--corpus_dir "my_local_data/" \
--top_k 5 \
--device cpu
Now the system searches both local docs and web data (if --web_search
is enabled).
python main.py --query "Climate change impact on economy" \
--rag_model gemma \
--personality "scientific"
This uses Gemma 2B to generate LLM-based summaries and the final report.
- Missing dependencies? Rerun:
pip install -r requirements.txt
- Ollama not found? Ensure it’s installed (
ollama list
showsgemma:2b
). - Memory issues? Use
--device cpu
. - Too many subqueries? Lower
--max_depth
to 1.
- Try different retrieval models (
--retrieval_model all-minilm
). - Tweak recursion (
--max_depth
). - Tune
config.yaml
for web search limits,min_relevance
, or Monte Carlo search.
- User Query: E.g.
"Quantum computing in healthcare"
. - CLI Flags (in
main.py
):--corpus_dir --device --retrieval_model --top_k --web_search --personality --rag_model --max_depth
- YAML Config (e.g.
config.yaml
):"results_base_dir"
,"max_query_length"
,"web_search_limit"
,"min_relevance"
, etc.
-
Configuration:
load_config(config_path)
to read YAML settings.min_relevance
: cutoff for subquery branching.
-
Session Initialization:
SearchSession.__init__()
sets:- A unique
query_id
&base_result_dir
. - Enhanced query via
chain_of_thought_query_enhancement()
. - Retrieval model loaded with
load_retrieval_model()
. - Query embedding for relevance checks (
embed_text()
). - Local files (if any) loaded & added to
KnowledgeBase
.
- A unique
- Subquery Generation:
- The enhanced query is split with
split_query()
.
- The enhanced query is split with
- Relevance Filtering:
- For each subquery, compare embeddings with the main query (via
late_interaction_score()
). - If
< min_relevance
, skip to avoid rabbit holes.
- For each subquery, compare embeddings with the main query (via
- TOCNode Creation:
- Each subquery →
TOCNode
, storing the text, summary, relevance, etc.
- Each subquery →
- Web Data:
- If relevant:
download_webpages_ddg()
to fetch results.parse_html_to_text()
and embed them.- Summarize snippets (
summarize_text()
).
- If
current_depth < max_depth
, optionally expand new sub-subqueries (chain-of-thought on the current subquery).
- If relevant:
- Hierarchy:
- All subqueries & expansions form a tree of TOC nodes for the final report.
- Local Documents + Downloaded Web Entries → appended into
KnowledgeBase
. - KnowledgeBase.search(...) for top-K relevant docs.
- Summaries:
- Summarize web results & local retrieval with
summarize_text()
.
- Summarize web results & local retrieval with
- _build_final_answer(...):
- Constructs a large prompt including:
- The user query,
- Table of Contents (with node summaries),
- Summaries of web & local results,
- Reference URLs.
- Asks for a “multi-section advanced markdown report.”
- Constructs a large prompt including:
- rag_final_answer(...):
- Calls
call_gemma()
(or other LLM) to produce the final text.
- Calls
- aggregate_results(...):
- Saves the final answer plus search data into a
.md
file inresults/<query_id>/
.
- Saves the final answer plus search data into a
- Subqueries with relevance_score < min_relevance are skipped.
- Depth-limited recursion ensures not to blow up on too many expansions.
- Monte Carlo expansions (optional) can sample random subqueries to avoid missing unexpected gems.
- Markdown report summarizing relevant subqueries, local docs, and a final advanced RAG-based discussion.
User Query
│
▼
main.py:
└── load_config(config.yaml)
└── Create SearchSession(...)
│
├── chain_of_thought_query_enhancement()
├── load_retrieval_model()
├── embed_text() for reference
├── load_corpus_from_dir() → KnowledgeBase.add_documents()
└── run_session():
└── perform_recursive_web_searches():
├── For each subquery:
│ ├─ Compute relevance_score
│ ├─ if relevance_score < min_relevance: skip
│ ├─ else:
│ │ ├─ download_webpages_ddg()
│ │ ├─ parse_html_to_text(), embed
│ │ ├─ summarize_text() → store in TOCNode
│ │ └─ if depth < max_depth:
│ │ └─ recursively expand
└── Aggregates web corpus, builds TOC
│
├── KnowledgeBase.search(enhanced_query, top_k)
├── Summarize results
├── _build_final_answer() → prompt
├── rag_final_answer() → call_gemma()
└── aggregate_results() → saves Markdown
If you found NanoSage useful for your research or project - or saved you 1 minute of googling, please consider citing it:
BibTeX Citation:
@misc{NanoSage,
author = {Foad Abo Dahood},
title = {NanoSage: A Recursive, Relevance-Driven Search and RAG Pipeline},
year = {2025},
howpublished = {\url{https://github.com/masterFoad/NanoSage}},
note = {Accessed: \today}
}
APA Citation:
Foad, Abo Dahood. (2025). NanoSage: A Recursive, Relevance-Driven Search and RAG Pipeline. Retrieved from https://github.com/masterFoad/NanoSage