You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add patch to support multi vector in faiss (#1358)
Signed-off-by: Heemin Kim <[email protected]>
* Initialize id_map as null (#1363)
Signed-off-by: Heemin Kim <[email protected]>
* Add support of multi vector in jni (#1364)
Signed-off-by: Heemin Kim <[email protected]>
* Multi vector support for Faiss HNSW (#1371)
Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.
Signed-off-by: Heemin Kim <[email protected]>
* Add data generation script for nested field (#1388)
Signed-off-by: Heemin Kim <[email protected]>
* Add perf test for nested field (#1394)
Signed-off-by: Heemin Kim <[email protected]>
---------
Signed-off-by: Heemin Kim <[email protected]>
* Add parent join support for lucene knn [#1182](https://github.com/opensearch-project/k-NN/pull/1182)
18
+
* Add parent join support for faiss hnsw [#1398](https://github.com/opensearch-project/k-NN/pull/1398)
18
19
### Enhancements
19
20
* Increase Lucene max dimension limit to 16,000 [#1346](https://github.com/opensearch-project/k-NN/pull/1346)
20
21
* Tuned default values for ef_search and ef_construction for better indexing and search performance for vector search [#1353](https://github.com/opensearch-project/k-NN/pull/1353)
| index_name | Name of index to ingest into | No default |
282
+
| field_name | Name of field to ingest into | No default |
283
+
| dataset_path | Path to data-set | No default |
284
+
| attributes_dataset_name | Name of dataset with additional attributes inside the main dataset | No default |
285
+
| attribute_spec | Definition of attributes, format is: [{ name: [name_val], type: [type_val]}] Order is important and must match order of attributes column in dataset file. It should contains { name: 'parent_id', type: 'int'} | No default |
286
+
287
+
##### Metrics
288
+
289
+
| Metric Name | Description | Unit |
290
+
| ----------- | ----------- | ----------- |
291
+
| took | Total time to ingest the dataset into the index.| ms |
292
+
273
293
#### query
274
294
275
295
Runs a set of queries against an index.
@@ -330,6 +350,36 @@ Runs a set of queries with filter against an index.
330
350
| recall@R | ratio of top R results from the ground truth neighbors that are in the K results returned by the plugin | float 0.0-1.0 |
331
351
| recall@K | ratio of results returned that were ground truth nearest neighbors | float 0.0-1.0 |
332
352
353
+
354
+
#### query_nested_field
355
+
356
+
Runs a set of queries with nested field against an index.
| dataset_format | Format the dataset is in. Currently hdf5 and bigann is supported. The hdf5 file must be organized in the same way that the ann-benchmarks organizes theirs. | 'hdf5' |
368
+
| dataset_path | Path to dataset | No default |
369
+
| neighbors_format | Format the neighbors dataset is in. Currently hdf5 and bigann is supported. The hdf5 file must be organized in the same way that the ann-benchmarks organizes theirs. | 'hdf5' |
370
+
| neighbors_path | Path to neighbors dataset | No default |
371
+
| neighbors_dataset | Name of filter dataset inside the neighbors dataset | No default |
372
+
| query_count | Number of queries to create from data-set | Size of the data-set |
373
+
374
+
##### Metrics
375
+
376
+
| Metric Name | Description | Unit |
377
+
| ----------- | ----------- | ----------- |
378
+
| took | Took times returned per query aggregated as total, p50, p90 and p99 (when applicable) | ms |
379
+
| memory_kb | Native memory k-NN is using at the end of the query workload | KB |
380
+
| recall@R | ratio of top R results from the ground truth neighbors that are in the K results returned by the plugin | float 0.0-1.0 |
381
+
| recall@K | ratio of results returned that were ground truth nearest neighbors | float 0.0-1.0 |
This will generate neighbours dataset as well. This new dataset(s) can be referred from testcase definition in `ingest_nested_field` and `query_nested_field` steps.
0 commit comments