-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] MongoDB Query Engine build on top of VectorDBQueryEngine #1092
base: main
Are you sure you want to change the base?
Conversation
logger.error("Failed to initialize the database: %s", e) | ||
return False | ||
|
||
def add_records( # type: ignore[no-untyped-def, override] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not override this method. You can either put chunk_size and chunck_overlap to init or put those in args/kwargs.
And why init_db method does not take these two parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah that's right, I think I over-engineer this part, let me fix that again, found an easier way to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think for the current approach, it does allow us more flexibility in adapting new document and transformation to get the best result, I think we could discuss more. So using the method in init_db
func would be a go-to solution but lack of customizability while with the current previous approach (using SimpleDirectoryLoader + SentenceSplitter + input to Indexer) would allow us to have more room to customize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think we don't need to override the add_records
method. The current add_docs
is not much different in loading documents from what init_db
does, except allowing single url or path, right? I feel it's not necessary.
For unit test, I think most unit tests in the project are using pytest. Please pytest in the future. |
16339d0
to
084f25f
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
|
except Exception as e: | ||
logger.error("Error inserting documents into the index: %s", e) | ||
|
||
def query(self, question: str, llm: Union[str, "BaseLanguageModel"], *args: Any, **kwargs: Any) -> Any: # type: ignore[no-any-unimported, type-arg] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llm should be in the constructor.
And use llama_index.core.llms.llm.LLM instead. We want to reduce the dependencies not add dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @AgentGenie I remove the BaseLanguageModel and replaced it with the LLM from from llama_index.llms.langchain.base import LLM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the notebook which is not working.
Please
- add instructions on how to set up mongo db for testing in the notebook.
- make sure people could follow the instruction and run through the notebook.
- list extra dependencies with version. They should also be included in pyproject.toml under rag
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
… on comment Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
…nnect_db to align with reviewer's comment and expectation, update the jupyter notebook again to align with the new mongodb_query_engine code, add on llm arg into the query function Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
…x-llms-langchain, update notebook for instruction usage Signed-off-by: sitloboi2012 <[email protected]>
9c661ff
to
a6efd9e
Compare
…update notebook Signed-off-by: sitloboi2012 <[email protected]>
vector_db (MongoDBAtlasVectorDB): The MongoDB vector database instance. | ||
vector_search_engine (MongoDBAtlasVectorSearch): The vector search engine. | ||
storage_context (StorageContext): The storage context for the vector store. | ||
indexer (Optional[VectorStoreIndex]): The index built from the documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index
self, | ||
connection_string: str = "", | ||
database_name: str = "vector_db", | ||
embedding_function: Optional[Callable[..., Any]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better to let users know what the default embedding function is.
def add_records( | ||
self, | ||
new_doc_dir: Optional[Union[str, Path]] = None, | ||
new_doc_paths_or_urls: Optional[Union[List[Union[str, Path]], Union[str, Path]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the protocol and remove singe string/path
Why are these changes needed?
Follow-up discussion with @AgentGenie and @Eric-Shang in #983 related to the usage of VectorDBQueryEngine protocol
Related issue number
Closed #983 , #1004 , #688 , #941 , Open #950
Checks