Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random sample in local mode #705

Merged
merged 8 commits into from
Aug 8, 2024

Conversation

coszio
Copy link
Contributor

@coszio coszio commented Jul 22, 2024

Implements random sampling in local mode and adds the required conversions

@coszio coszio requested review from agourlay and joein July 22, 2024 21:58
remote_client = init_remote(prefer_grpc=prefer_grpc)
init_client(remote_client, fixture_points)

compare_client_results(local_client, remote_client, searcher.random_query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the random implementations between local and server equivalent?
I would expect those to return different values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test we basically sample all the points available in the collection
In random_query we use limit=100 and we generate 100 points in fixture_points = generate_fixtures(100)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return all points postprocessed to be sorted by ID. Just to make sure we can return all of them

Comment on lines +1793 to +1799
random_scores = np.random.rand(len(self.ids))
random_order = np.argsort(random_scores)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: np.random.permutation(self.inv_ids)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we need the internal ids to filter against the mask, not the external ones

@joein
Copy link
Member

joein commented Jul 25, 2024

my comments are mostly nit, other than that looks good

@coszio coszio requested a review from joein July 30, 2024 15:22
@joein joein marked this pull request as ready for review August 8, 2024 15:29
@joein joein force-pushed the random-sample-local-mode branch from c49579f to 41c2fd5 Compare August 8, 2024 15:42
@joein joein merged commit 9f31c70 into universal-search-query-group-API Aug 8, 2024
10 checks passed
joein added a commit that referenced this pull request Aug 8, 2024
* pre-implement random sampling

* generate models

* add conversions and tests

* fix mypy lints

* tests: add test for sample random conversion

* use camelcase Sample.Random

* review fixes

* fix mypy

---------

Co-authored-by: George Panchuk <[email protected]>
joein added a commit that referenced this pull request Aug 9, 2024
* universal-search: Query Group API and local mode

* requires different logic for gRPC error assertion

* you come to me at runtime for a compile time issue

* suddenly throwing a different error

* test more group key types

* extend limit of prefetches during group_by

* rescoring is the issue

* one problem at a time please

* code review

* regen clients

* code review

* add lookup_from to query_points_groups

* test with_lookup

* test and fix gRPC

* drop dedicated conversion

* Update qdrant_client/qdrant_client.py

Co-authored-by: Luis Cossío <[email protected]>

* regen async

* Distribution-based score fusion in local mode (#703)

* pre-implement dbsf

* add dbsf congruence tests

* mypy lints

* add conversions

* tests: add test for dbsf conversion

---------

Co-authored-by: George Panchuk <[email protected]>

* Random sample in local mode (#705)

* pre-implement random sampling

* generate models

* add conversions and tests

* fix mypy lints

* tests: add test for sample random conversion

* use camelcase Sample.Random

* review fixes

* fix mypy

---------

Co-authored-by: George Panchuk <[email protected]>

* fix: add type ignore for mypy

* fix: fix type hints for 3.8

* fix: do not run mypy on async client generator in CI, simplify condition

* Grpc comparison in tests (#726)

* add parametrized fixture for using grpc too

* compare grpc and http without running each setup twice

* fix: fix exception types in invalid types test

* fix: remove random seed which led to a erroneous sequence

---------

Co-authored-by: Luis Cossío <[email protected]>

---------

Co-authored-by: Luis Cossío <[email protected]>
Co-authored-by: George Panchuk <[email protected]>
joein added a commit that referenced this pull request Aug 12, 2024
* universal-search: Query Group API and local mode

* requires different logic for gRPC error assertion

* you come to me at runtime for a compile time issue

* suddenly throwing a different error

* test more group key types

* extend limit of prefetches during group_by

* rescoring is the issue

* one problem at a time please

* code review

* regen clients

* code review

* add lookup_from to query_points_groups

* test with_lookup

* test and fix gRPC

* drop dedicated conversion

* Update qdrant_client/qdrant_client.py

Co-authored-by: Luis Cossío <[email protected]>

* regen async

* Distribution-based score fusion in local mode (#703)

* pre-implement dbsf

* add dbsf congruence tests

* mypy lints

* add conversions

* tests: add test for dbsf conversion

---------

Co-authored-by: George Panchuk <[email protected]>

* Random sample in local mode (#705)

* pre-implement random sampling

* generate models

* add conversions and tests

* fix mypy lints

* tests: add test for sample random conversion

* use camelcase Sample.Random

* review fixes

* fix mypy

---------

Co-authored-by: George Panchuk <[email protected]>

* fix: add type ignore for mypy

* fix: fix type hints for 3.8

* fix: do not run mypy on async client generator in CI, simplify condition

* Grpc comparison in tests (#726)

* add parametrized fixture for using grpc too

* compare grpc and http without running each setup twice

* fix: fix exception types in invalid types test

* fix: remove random seed which led to a erroneous sequence

---------

Co-authored-by: Luis Cossío <[email protected]>

---------

Co-authored-by: Luis Cossío <[email protected]>
Co-authored-by: George Panchuk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants