Random sample in local mode #705

coszio · 2024-07-22T21:58:34Z

Implements random sampling in local mode and adds the required conversions

agourlay · 2024-07-23T10:59:20Z

tests/congruence_tests/test_query.py

+    remote_client = init_remote(prefer_grpc=prefer_grpc)
+    init_client(remote_client, fixture_points)
+
+    compare_client_results(local_client, remote_client, searcher.random_query)


Why are the random implementations between local and server equivalent?
I would expect those to return different values.

In this test we basically sample all the points available in the collection
In random_query we use limit=100 and we generate 100 points in fixture_points = generate_fixtures(100)

We return all points postprocessed to be sorted by ID. Just to make sure we can return all of them

qdrant_client/proto/points.proto

qdrant_client/conversions/conversion.py

qdrant_client/local/local_collection.py

joein · 2024-07-25T15:42:31Z

qdrant_client/local/local_collection.py

+        random_scores = np.random.rand(len(self.ids))
+        random_order = np.argsort(random_scores)


nit: np.random.permutation(self.inv_ids)

but we need the internal ids to filter against the mask, not the external ones

joein · 2024-07-25T15:53:40Z

my comments are mostly nit, other than that looks good

* pre-implement random sampling * generate models * add conversions and tests * fix mypy lints * tests: add test for sample random conversion * use camelcase Sample.Random * review fixes * fix mypy --------- Co-authored-by: George Panchuk <[email protected]>

* universal-search: Query Group API and local mode * requires different logic for gRPC error assertion * you come to me at runtime for a compile time issue * suddenly throwing a different error * test more group key types * extend limit of prefetches during group_by * rescoring is the issue * one problem at a time please * code review * regen clients * code review * add lookup_from to query_points_groups * test with_lookup * test and fix gRPC * drop dedicated conversion * Update qdrant_client/qdrant_client.py Co-authored-by: Luis Cossío <[email protected]> * regen async * Distribution-based score fusion in local mode (#703) * pre-implement dbsf * add dbsf congruence tests * mypy lints * add conversions * tests: add test for dbsf conversion --------- Co-authored-by: George Panchuk <[email protected]> * Random sample in local mode (#705) * pre-implement random sampling * generate models * add conversions and tests * fix mypy lints * tests: add test for sample random conversion * use camelcase Sample.Random * review fixes * fix mypy --------- Co-authored-by: George Panchuk <[email protected]> * fix: add type ignore for mypy * fix: fix type hints for 3.8 * fix: do not run mypy on async client generator in CI, simplify condition * Grpc comparison in tests (#726) * add parametrized fixture for using grpc too * compare grpc and http without running each setup twice * fix: fix exception types in invalid types test * fix: remove random seed which led to a erroneous sequence --------- Co-authored-by: Luis Cossío <[email protected]> --------- Co-authored-by: Luis Cossío <[email protected]> Co-authored-by: George Panchuk <[email protected]>

coszio requested review from agourlay and joein July 22, 2024 21:58

agourlay reviewed Jul 23, 2024

View reviewed changes