Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fastembed integration #210

Merged
merged 32 commits into from
Aug 18, 2023
Merged

Add fastembed integration #210

merged 32 commits into from
Aug 18, 2023

Conversation

NirantK
Copy link
Contributor

@NirantK NirantK commented Jul 12, 2023

This PR adds two new functions add and query and a new return object type QueryResponse. This makes it a lot easier for folks coming from NLP and looking to use SoTA Embedding (beating OpenAI) but several times faster.

Improvements:

  1. Configurable ONNX runtime — allows users to use GPU, Mac Metal M1/M2, CPU and more runtimes for creating batch embedding at insertion time. This is done via the ONNX Runtime and we default to the CPU Runtime
  2. Quantized model using optimum — this makes the model ~2x faster compared to the PyTorch runtime on CPU

@NirantK NirantK requested a review from generall July 12, 2023 08:41
@netlify
Copy link

netlify bot commented Jul 12, 2023

Deploy Preview for poetic-froyo-8baba7 ready!

Name Link
🔨 Latest commit 8835f08
🔍 Latest deploy log https://app.netlify.com/sites/poetic-froyo-8baba7/deploys/64df3112e39d5c0008a60076
😎 Deploy Preview https://deploy-preview-210--poetic-froyo-8baba7.netlify.app/qdrant_client.conversions.conversion
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

…rom fastembed instead of fastvector

* fix(qdrant_client.py): remove unnecessary print statement
@NirantK NirantK changed the title [DRAFT] Add skeleton for fastvector registration [DRAFT] Add skeleton for fastembed registration Jul 31, 2023
NirantK added 2 commits July 31, 2023 17:29
* feat(README.md): add section for Fast Embeddings + Simpler API
* fix(README.md): fix formatting of code block and update code example for Qdrant Client usage
*
@NirantK NirantK marked this pull request as ready for review July 31, 2023 12:13
@NirantK NirantK requested a review from generall July 31, 2023 12:14
@NirantK NirantK changed the title [DRAFT] Add skeleton for fastembed registration Add skeleton for fastembed registration Jul 31, 2023
@NirantK NirantK changed the title Add skeleton for fastembed registration Add fastembed integration Jul 31, 2023
@generall
Copy link
Member

generall commented Aug 1, 2023

Overall it would be nice to have tests for this integration

NirantK added 4 commits August 3, 2023 12:38
* feat(qdrant_client.py): add support for adding and querying documents with fastembed installed
* test(qdrant_client.py): add tests for adding and querying documents with and without fastembed installed
…ntAPIExtensions for upsert_docs and search_docs methods
* feat(qdrant_client.py): add return type hint to QdrantClient.search_docs method
@joein joein added the enhancement New feature or request label Aug 5, 2023
tests/test_qdrant_client.py Outdated Show resolved Hide resolved
… DefaultEmbedding instead of FlagEmbedding

* refactor(qdrant_client.py): refactor code to remove unnecessary loop
* feat(qdrant_client.py): add support for search parameters in search method
* refactor(qdrant_client.py): refactor indexing logic to handle embeddings correctly
…s in collection

* test(test_fast_embed.py): remove unused code
* test(test_fast_embed.py): add TODO comment for future assertions
…dencies

* feat(pyproject.toml): add fastembed dependency to fastembed group
* fix(test_fast_embed.py): add default values for test_no_install parameters
* fix(test_fast_embed.py): skip test if FastEmbed is installed
@joein
Copy link
Member

joein commented Aug 16, 2023

haven't yet looked deep into the PR, but all the packages inside qdrant_client.http are auto-generated and should not be modified manually

if it is actually required to modify them, https://github.com/qdrant/pydantic_openapi_v3 should be upgraded first

@NirantK
Copy link
Contributor Author

NirantK commented Aug 17, 2023

@joein for return types which are exclusive to the Python client for now — is the OpenAPI where the changes to be made?

Since it auto-generates from the Rust models directly, would not prefer doing that. Alternate proposal 4ff71e3 is to keep this in the Python client itself

@generall
Copy link
Member

Hi @NirantK, I made some changes into PR:

  • Moved fastembed function into middleware, so it is easier to navigate
  • Dropped support for 3.7 python, as fastembed requires >3.7. If it is not actually a case, please change that in fastembed manifest and revert my changes regarding it. But we wanted to drop 3.7 sooner or later either way
  • Made interface a bit more compatible and consistent with existing functions. I hope it didn't compromise the "pythonistic" approach too much

I am going to suggest some changes in fastembed as well

Comment on lines 232 to 243
return self.search(
collection_name=collection_name,
query_vector=query_vector,
query_filter=query_filter,
search_params=search_params,
limit=limit,
offset=offset,
with_payload=with_payload,
with_vectors=with_vectors,
score_threshold=score_threshold,
**kwargs,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A major point of API improvement is a simpler Response type, this is why I created QueryResponse and simplified the ScoredPoint and discarded a lot of information.

Would it possible to retain that here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate how plain dict it better than a type?
Previous version converted QueryResponse into dict before returning it, in fact QueryResponse was only used internally and gave no info to the user about the response structure.

If you propose to replace return type ScoredPoint -> QueryResponse that I can agree on

with_payload: Union[bool, Sequence[str], models.PayloadSelector] = True,
with_vectors: Union[bool, Sequence[str]] = True,
score_threshold: Optional[float] = None,
embedding_model: str = DEFAULT_EMBEDDING_MODEL,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query endpoint should NOT accept embedding_model!

This can lead to oversight e.g. folks trying to use different models for add and query. This means we've to persist the state across multiple client sessions. One way to do so would be to have that as part of the payload itself perhaps?

Comment on lines 177 to 179
offset: int = 0,
with_payload: Union[bool, Sequence[str], models.PayloadSelector] = True,
with_vectors: Union[bool, Sequence[str]] = True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these added params makes the function more consistent with the client at the cost of simplicity?

We'd want to have only collection_name, limit and query_text ideally!

Anyone who cares about these params enough to read the docs and understand those — should use the APIs which our client already has!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can hide those under **kwargs

Comment on lines 245 to 258
def query_batch(
self,
collection_name: str,
query_texts: List[str],
query_filter: Optional[models.Filter] = None,
search_params: Optional[models.SearchParams] = None,
limit: int = 10,
offset: int = 0,
with_payload: Union[bool, Sequence[str], models.PayloadSelector] = True,
with_vectors: Union[bool, Sequence[str]] = True,
score_threshold: Optional[float] = None,
embedding_model: str = DEFAULT_EMBEDDING_MODEL,
**kwargs: Any,
) -> List[List[types.ScoredPoint]]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most notable difference is that this accepts a List[str] as queries — this is not worth learning for most MLE!

Can we make a call from the query endpoint to support this behaviour instead? Thereby removing the need for engineer to learn this — and keeping it as a separate function at the same time?

Copy link
Contributor Author

@NirantK NirantK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a lot cleaner to me! Good to go from my end after the CI clears

qdrant_client/qdrant_fastembed.py Show resolved Hide resolved
@NirantK
Copy link
Contributor Author

NirantK commented Aug 18, 2023

@generall merge when you're satisfied?

@generall generall merged commit 536b0ee into master Aug 18, 2023
@NirantK NirantK deleted the fastvector branch August 24, 2023 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants