[Misc] MongoDB Query Engine build on top of VectorDBQueryEngine #1092

sitloboi2012 · 2025-02-22T15:10:23Z

Why are these changes needed?

Follow-up discussion with @AgentGenie and @Eric-Shang in #983 related to the usage of VectorDBQueryEngine protocol

Related issue number

Closed #983 , #1004 , #688 , #941 , Open #950

Checks

I've included any doc changes needed for https://docs.ag2.ai/. See https://docs.ag2.ai/docs/contributor-guide/documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

autogen/agentchat/contrib/rag/mongodb.py

AgentGenie · 2025-02-23T06:44:16Z

autogen/agentchat/contrib/rag/mongodb.py

+            logger.error("Failed to initialize the database: %s", e)
+            return False
+
+    def add_records(  # type: ignore[no-untyped-def, override]


Do not override this method. You can either put chunk_size and chunck_overlap to init or put those in args/kwargs.

And why init_db method does not take these two parameters?

oh yeah that's right, I think I over-engineer this part, let me fix that again, found an easier way to do this

but I think for the current approach, it does allow us more flexibility in adapting new document and transformation to get the best result, I think we could discuss more. So using the method in init_db func would be a go-to solution but lack of customizability while with the current previous approach (using SimpleDirectoryLoader + SentenceSplitter + input to Indexer) would allow us to have more room to customize

I also think we don't need to override the add_records method. The current add_docs is not much different in loading documents from what init_db does, except allowing single url or path, right? I feel it's not necessary.

autogen/agentchat/contrib/rag/mongodb.py

AgentGenie · 2025-02-24T06:52:16Z

For unit test, I think most unit tests in the project are using pytest. Please pytest in the future.

autogen/agentchat/contrib/rag/mongodb.py

codecov · 2025-02-24T15:53:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ There is a different number of reports uploaded between BASE (1792140) and HEAD (9c661ff). Click for more details.

HEAD has 1286 uploads less than BASE

Flag BASE (1792140) HEAD (9c661ff)

3.10 99 0

ubuntu-latest 145 1

falkordb 2 0

3.11 66 1

3.13 88 0

lmm 4 0

teachable 4 0

core-llm 4 0

3.9 80 0

gemini 15 0

anthropic 16 0

core-without-llm 14 1

autobuild 1 0

macos-latest 108 0

integration 11 0

browser-use 6 0

3.12 39 0

neo4j 2 0

windows-latest 119 0

crawl4ai 13 0

deepseek 1 0

interop 13 0

agent-eval 1 0

retrievechat 15 0

retrievechat-qdrant 14 0

gpt-assistant-agent 3 0

openai 1 0

cerebras 14 0

optional-deps 141 0

retrievechat-couchbase 3 0

retrievechat-mongodb 10 0

retrievechat-pgvector 10 0

commsagent-discord 9 0

llama-index-agent 3 0

commsagent-slack 9 0

commsagent-telegram 9 0

long-context 3 0

jupyter-executor 9 0

websurfer 15 0

cohere 15 0

graph-rag-falkor-db 6 0

twilio 9 0

groq 14 0

mistral 14 0

ollama 14 0

together 14 0

bedrock 14 0

swarm 14 0

reasoning 14 0

docs 6 0

interop-langchain 9 0

interop-crewai 9 0

interop-pydantic-ai 9 0

websockets 9 0

see 62 files with indirect coverage changes

AgentGenie · 2025-02-24T17:50:39Z

autogen/agentchat/contrib/rag/mongodb_query_engine.py

+        except Exception as e:
+            logger.error("Error inserting documents into the index: %s", e)
+
+    def query(self, question: str, llm: Union[str, "BaseLanguageModel"], *args: Any, **kwargs: Any) -> Any:  # type: ignore[no-any-unimported, type-arg]


llm should be in the constructor.
And use llama_index.core.llms.llm.LLM instead. We want to reduce the dependencies not add dependencies.

hi @AgentGenie I remove the BaseLanguageModel and replaced it with the LLM from from llama_index.llms.langchain.base import LLM

AgentGenie

I tested the notebook which is not working.
Please

add instructions on how to set up mongo db for testing in the notebook.
make sure people could follow the instruction and run through the notebook.
list extra dependencies with version. They should also be included in pyproject.toml under rag

Signed-off-by: sitloboi2012 <[email protected]>

… on comment Signed-off-by: sitloboi2012 <[email protected]>

Signed-off-by: sitloboi2012 <[email protected]>

…nnect_db to align with reviewer's comment and expectation, update the jupyter notebook again to align with the new mongodb_query_engine code, add on llm arg into the query function Signed-off-by: sitloboi2012 <[email protected]>

Signed-off-by: sitloboi2012 <[email protected]>

…x-llms-langchain, update notebook for instruction usage Signed-off-by: sitloboi2012 <[email protected]>

…update notebook Signed-off-by: sitloboi2012 <[email protected]>

AgentGenie · 2025-02-25T19:33:16Z

autogen/agentchat/contrib/rag/mongodb_query_engine.py

+        vector_db (MongoDBAtlasVectorDB): The MongoDB vector database instance.
+        vector_search_engine (MongoDBAtlasVectorSearch): The vector search engine.
+        storage_context (StorageContext): The storage context for the vector store.
+        indexer (Optional[VectorStoreIndex]): The index built from the documents.


Eric-Shang · 2025-02-26T02:49:07Z

autogen/agentchat/contrib/rag/mongodb_query_engine.py

+        self,
+        connection_string: str = "",
+        database_name: str = "vector_db",
+        embedding_function: Optional[Callable[..., Any]] = None,


It'd be better to let users know what the default embedding function is.

autogen/agentchat/contrib/rag/mongodb_query_engine.py

Eric-Shang · 2025-02-27T02:39:30Z

autogen/agentchat/contrib/rag/mongodb_query_engine.py

+    def add_records(
+        self,
+        new_doc_dir: Optional[Union[str, Path]] = None,
+        new_doc_paths_or_urls: Optional[Union[List[Union[str, Path]], Union[str, Path]]] = None,


Please follow the protocol and remove singe string/path

sitloboi2012 had a problem deploying to openai1 February 22, 2025 15:10 — with GitHub Actions Error

sitloboi2012 requested a review from AgentGenie February 22, 2025 15:10

sitloboi2012 had a problem deploying to openai1 February 22, 2025 15:10 — with GitHub Actions Error

sitloboi2012 requested a review from Eric-Shang February 22, 2025 15:10

sitloboi2012 had a problem deploying to openai1 February 22, 2025 15:10 — with GitHub Actions Error

sitloboi2012 marked this pull request as draft February 22, 2025 15:10

sitloboi2012 had a problem deploying to openai1 February 22, 2025 15:10 — with GitHub Actions Error

AgentGenie reviewed Feb 23, 2025

View reviewed changes

autogen/agentchat/contrib/rag/mongodb.py Outdated Show resolved Hide resolved

AgentGenie reviewed Feb 23, 2025

View reviewed changes

autogen/agentchat/contrib/rag/mongodb.py Outdated Show resolved Hide resolved

AgentGenie reviewed Feb 23, 2025

View reviewed changes

autogen/agentchat/contrib/rag/mongodb.py Outdated Show resolved Hide resolved

sitloboi2012 requested a deployment to openai1 February 23, 2025 17:22 — with GitHub Actions Waiting

Eric-Shang requested changes Feb 23, 2025

View reviewed changes

autogen/agentchat/contrib/rag/mongodb.py Outdated Show resolved Hide resolved

autogen/agentchat/contrib/rag/mongodb.py Outdated Show resolved Hide resolved

autogen/agentchat/contrib/rag/mongodb.py Outdated Show resolved Hide resolved

AgentGenie reviewed Feb 24, 2025

View reviewed changes

autogen/agentchat/contrib/rag/mongodb.py Outdated Show resolved Hide resolved

sitloboi2012 force-pushed the feat/mongodb-query-engine branch from 16339d0 to 084f25f Compare February 24, 2025 14:21

sitloboi2012 requested review from Eric-Shang and AgentGenie February 24, 2025 15:45

AgentGenie reviewed Feb 24, 2025

View reviewed changes

AgentGenie requested changes Feb 24, 2025

View reviewed changes

sitloboi2012 added 11 commits February 25, 2025 20:38

initial setup for mongodb query engine and notebook usage

64ae9d3

Signed-off-by: sitloboi2012 <[email protected]>

update mongodb query engine class again

e31beba

Signed-off-by: sitloboi2012 <[email protected]>

update mongodb query engine to use docling

8aa5408

Signed-off-by: sitloboi2012 <[email protected]>

update and finalize the mongodb query engine with documentation

2ec2e1f

Signed-off-by: sitloboi2012 <[email protected]>

refactor the add_records again to simplify the solution, update based…

72c6889

… on comment Signed-off-by: sitloboi2012 <[email protected]>

add on test case for mongodb query engine WIP

2a7bd7e

Signed-off-by: sitloboi2012 <[email protected]>

add on test case for mongodb query engine WIP

d3eb183

Signed-off-by: sitloboi2012 <[email protected]>

remove the LLM def in query function

0d931e2

Signed-off-by: sitloboi2012 <[email protected]>

update test case for mongodb query engine

b0c466d

Signed-off-by: sitloboi2012 <[email protected]>

update llm into __init__, update pyproject.toml to include llama-inde…

a6efd9e

…x-llms-langchain, update notebook for instruction usage Signed-off-by: sitloboi2012 <[email protected]>

sitloboi2012 force-pushed the feat/mongodb-query-engine branch from 9c661ff to a6efd9e Compare February 25, 2025 15:13

replace the as_chat_engine to as_query_engine in MongoDBQueryEngine, …

a0d754c

…update notebook Signed-off-by: sitloboi2012 <[email protected]>

sitloboi2012 requested a review from AgentGenie February 25, 2025 15:40

AgentGenie reviewed Feb 25, 2025

View reviewed changes

Eric-Shang requested changes Feb 26, 2025

View reviewed changes

update mongodb_query_engine

37996c5

Eric-Shang requested changes Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] MongoDB Query Engine build on top of VectorDBQueryEngine #1092

[Misc] MongoDB Query Engine build on top of VectorDBQueryEngine #1092

sitloboi2012 commented Feb 22, 2025

AgentGenie Feb 23, 2025

sitloboi2012 Feb 23, 2025

sitloboi2012 Feb 23, 2025

Eric-Shang Feb 23, 2025 •

edited

Loading

AgentGenie commented Feb 24, 2025

codecov bot commented Feb 24, 2025

AgentGenie Feb 24, 2025

sitloboi2012 Feb 25, 2025

AgentGenie left a comment

AgentGenie Feb 25, 2025

Eric-Shang Feb 26, 2025 •

edited

Loading

Eric-Shang Feb 27, 2025

[Misc] MongoDB Query Engine build on top of VectorDBQueryEngine #1092

Are you sure you want to change the base?

[Misc] MongoDB Query Engine build on top of VectorDBQueryEngine #1092

Conversation

sitloboi2012 commented Feb 22, 2025

Why are these changes needed?

Related issue number

Checks

AgentGenie Feb 23, 2025

Choose a reason for hiding this comment

sitloboi2012 Feb 23, 2025

Choose a reason for hiding this comment

sitloboi2012 Feb 23, 2025

Choose a reason for hiding this comment

Eric-Shang Feb 23, 2025 • edited Loading

Choose a reason for hiding this comment

AgentGenie commented Feb 24, 2025

codecov bot commented Feb 24, 2025

Codecov Report

AgentGenie Feb 24, 2025

Choose a reason for hiding this comment

sitloboi2012 Feb 25, 2025

Choose a reason for hiding this comment

AgentGenie left a comment

Choose a reason for hiding this comment

AgentGenie Feb 25, 2025

Choose a reason for hiding this comment

Eric-Shang Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Eric-Shang Feb 27, 2025

Choose a reason for hiding this comment

Eric-Shang Feb 23, 2025 •

edited

Loading

Eric-Shang Feb 26, 2025 •

edited

Loading