Deep Research assistant that runs on your laptop, using tiny models. - all open source!
It offers a structured breakdown of a multi-source, relevance-driven, recursive search pipeline. It walks through how the system refines a user query, builds a knowledge base from local and web data, and dynamically explores subqueriesβtracking progress through a Table of Contents (TOC).
With Monte Carlo-based exploration, the system balances depth vs. breadth, ranking each branchβs relevance to ensure precision and avoid unrelated tangents. The result? A detailed, well-organized report generated using retrieval-augmented generation (RAG), integrating the most valuable insights.
I wanted to experiment with new research methods, so I thought, basically, when we research a topic, we randomly explore new ideas as we search, and NanoSage basically does that! It explores and records its journey, where each (relevant) step is a node... and then sums it up to you in a neat report! Where the table of content is basically its search graph. π§
You can find an example report in the following link:
example report output for query: "Create a structure bouldering gym workout to push my climbing from v4 to v6"
- Ensure Python 3.8+ is installed.
- Install required packages:
pip install -r requirements.txt
- (Optional) For GPU acceleration, install PyTorch with CUDA:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
(Replace cu118
with your CUDA version.)
- Make sure to update pyOpenSSL and cryptography:
pip install --upgrade pyOpenSSL cryptography
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
pip install --upgrade ollama
(Windows users: see ollama.com for installer.)
- Pull Gemma 2B (for RAG-based summaries):
ollama pull gemma2:2b
A sample command to run your search session:
python main.py --query "Create a structure bouldering gym workout to push my climbing from v4 to v6" \
--web_search \
--max_depth 2 \
--device cpu \
--top_k 10 \
--retrieval_model colpali
Parameters:
--query
: Main search query (natural language).--web_search
: Enables web-based retrieval.--max_depth
: Recursion depth for subqueries (2 levels).--device cpu
: Uses CPU (swap withcuda
for GPU).--retrieval_model colpali
: Uses ColPali for retrieval (tryall-minilm
for lighter model).
A detailed Markdown report will appear in results/<query_id>/
.
Example:
results/
βββ 389380e2/
βββ Quantum_computing_in_healthcare_output.md
βββ web_Quantum_computing/
βββ web_results/
βββ local_results/
Open the *_output.md
file (e.g., Quantum_computing_in_healthcare_output.md
) in a Markdown viewer (VSCode, Obsidian, etc.).
If you have local PDFs, text files, or images:
python main.py --query "AI in finance" \
--corpus_dir "my_local_data/" \
--top_k 5 \
--device cpu
Now the system searches both local docs and web data (if --web_search
is enabled).
python main.py --query "Climate change impact on economy" \
--rag_model gemma \
--personality "scientific"
This uses Gemma 2B to generate LLM-based summaries and the final report.
- Missing dependencies? Rerun:
pip install -r requirements.txt
- Ollama not found? Ensure itβs installed (
ollama list
showsgemma:2b
). - Memory issues? Use
--device cpu
. - Too many subqueries? Lower
--max_depth
to 1.
- Try different retrieval models (
--retrieval_model all-minilm
). - Tweak recursion (
--max_depth
). - Tune
config.yaml
for web search limits,min_relevance
, or Monte Carlo search.
- User Query: E.g.
"Quantum computing in healthcare"
. - CLI Flags (in
main.py
):--corpus_dir --device --retrieval_model --top_k --web_search --personality --rag_model --max_depth
- YAML Config (e.g.
config.yaml
):"results_base_dir"
,"max_query_length"
,"web_search_limit"
,"min_relevance"
, etc.
-
Configuration:
load_config(config_path)
to read YAML settings.min_relevance
: cutoff for subquery branching.
-
Session Initialization:
SearchSession.__init__()
sets:- A unique
query_id
&base_result_dir
. - Enhanced query via
chain_of_thought_query_enhancement()
. - Retrieval model loaded with
load_retrieval_model()
. - Query embedding for relevance checks (
embed_text()
). - Local files (if any) loaded & added to
KnowledgeBase
.
- A unique
- Subquery Generation:
- The enhanced query is split with
split_query()
.
- The enhanced query is split with
- Relevance Filtering:
- For each subquery, compare embeddings with the main query (via
late_interaction_score()
). - If
< min_relevance
, skip to avoid rabbit holes.
- For each subquery, compare embeddings with the main query (via
- TOCNode Creation:
- Each subquery β
TOCNode
, storing the text, summary, relevance, etc.
- Each subquery β
- Web Data:
- If relevant:
download_webpages_ddg()
to fetch results.parse_html_to_text()
and embed them.- Summarize snippets (
summarize_text()
).
- If
current_depth < max_depth
, optionally expand new sub-subqueries (chain-of-thought on the current subquery).
- If relevant:
- Hierarchy:
- All subqueries & expansions form a tree of TOC nodes for the final report.
- Local Documents + Downloaded Web Entries β appended into
KnowledgeBase
. - KnowledgeBase.search(...) for top-K relevant docs.
- Summaries:
- Summarize web results & local retrieval with
summarize_text()
.
- Summarize web results & local retrieval with
- _build_final_answer(...):
- Constructs a large prompt including:
- The user query,
- Table of Contents (with node summaries),
- Summaries of web & local results,
- Reference URLs.
- Asks for a βmulti-section advanced markdown report.β
- Constructs a large prompt including:
- rag_final_answer(...):
- Calls
call_gemma()
(or other LLM) to produce the final text.
- Calls
- aggregate_results(...):
- Saves the final answer plus search data into a
.md
file inresults/<query_id>/
.
- Saves the final answer plus search data into a
- Subqueries with relevance_score < min_relevance are skipped.
- Depth-limited recursion ensures not to blow up on too many expansions.
- Monte Carlo expansions (optional) can sample random subqueries to avoid missing unexpected gems.
- Markdown report summarizing relevant subqueries, local docs, and a final advanced RAG-based discussion.
User Query
β
βΌ
main.py:
βββ load_config(config.yaml)
βββ Create SearchSession(...)
β
βββ chain_of_thought_query_enhancement()
βββ load_retrieval_model()
βββ embed_text() for reference
βββ load_corpus_from_dir() β KnowledgeBase.add_documents()
βββ run_session():
βββ perform_recursive_web_searches():
βββ For each subquery:
β ββ Compute relevance_score
β ββ if relevance_score < min_relevance: skip
β ββ else:
β β ββ download_webpages_ddg()
β β ββ parse_html_to_text(), embed
β β ββ summarize_text() β store in TOCNode
β β ββ if depth < max_depth:
β β ββ recursively expand
βββ Aggregates web corpus, builds TOC
β
βββ KnowledgeBase.search(enhanced_query, top_k)
βββ Summarize results
βββ _build_final_answer() β prompt
βββ rag_final_answer() β call_gemma()
βββ aggregate_results() β saves Markdown
If you found NanoSage useful for your research or project - or saved you 1 minute of googling, please consider citing it:
BibTeX Citation:
@misc{NanoSage,
author = {Foad Abo Dahood},
title = {NanoSage: A Recursive, Relevance-Driven Search and RAG Pipeline},
year = {2025},
howpublished = {\url{https://github.com/masterFoad/NanoSage}},
note = {Accessed: \today}
}
APA Citation:
Foad, Abo Dahood. (2025). NanoSage: A Recursive, Relevance-Driven Search and RAG Pipeline. Retrieved from https://github.com/masterFoad/NanoSage