feat: Enable secondary index for compound filter conditions #3417

islamaliev · 2025-01-30T16:25:34Z

Relevant issue(s)

Resolves #3299

Description

Utilize secondary indexes even when compound filter conditions are present.
For this to work new filter traversing utility function is introduced that can be configured to different needs.

And not that indexes are exposed to more complex conditions they started to produce more false positive docs that weren't checked by the filter because the index fetcher was not part of the new fetcher chain.

Make index fetcher implement new fetcher interface so that the documents it fetches can be checked against the scanner filter and permissions.

Change behavior of connor to recognize if a field exists. It's need to distinguish if _ne filter returns false because 2 values are different or becuase the document doesn't have the field.

Make fieldFetched explain metric count all fields fetched, not only fields that were requested.

…check-compound-filter-condition

codecov · 2025-01-30T16:42:32Z

Codecov Report

Attention: Patch coverage is 92.43697% with 36 lines in your changes missing coverage. Please review.

Project coverage is 78.27%. Comparing base (a2b8971) to head (91ac5a9).

Files with missing lines	Patch %	Lines
internal/db/fetcher/indexer.go	80.65%	8 Missing and 4 partials ⚠️
internal/db/fetcher/indexer_matchers.go	90.28%	5 Missing and 2 partials ⚠️
internal/db/fetcher/wrapper.go	82.86%	4 Missing and 2 partials ⚠️
internal/db/fetcher/document.go	40.00%	2 Missing and 1 partial ⚠️
internal/db/fetcher/indexer_iterators.go	96.81%	2 Missing and 1 partial ⚠️
internal/planner/type_join.go	97.22%	2 Missing ⚠️
internal/connor/and.go	0.00%	0 Missing and 1 partial ⚠️
internal/connor/in.go	0.00%	0 Missing and 1 partial ⚠️
internal/connor/or.go	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3417      +/-   ##
===========================================
- Coverage    78.28%   78.27%   -0.01%     
===========================================
  Files          392      393       +1     
  Lines        36045    36106      +61     
===========================================
+ Hits         28217    28260      +43     
- Misses        6163     6185      +22     
+ Partials      1665     1661       -4

Flag	Coverage Δ
all-tests	`78.27% <92.44%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
internal/connor/all.go	`83.87% <100.00%> (ø)`
internal/connor/any.go	`90.32% <100.00%> (ø)`
internal/connor/connor.go	`100.00% <100.00%> (ø)`
internal/connor/eq.go	`92.31% <100.00%> (ø)`
internal/connor/ne.go	`72.73% <100.00%> (+15.58%)`	⬆️
internal/connor/none.go	`83.87% <100.00%> (ø)`
internal/connor/not.go	`100.00% <100.00%> (ø)`
internal/db/collection_get.go	`79.66% <100.00%> (-4.82%)`	⬇️
internal/db/collection_index.go	`87.53% <100.00%> (+0.03%)`	⬆️
internal/db/fetcher/fetcher.go	`100.00% <ø> (ø)`
... and 17 more

... and 19 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2b8971...91ac5a9. Read the comment docs.

shahzadlone · 2025-01-30T20:03:53Z

question: Do you know why the coverage takes a hit of -23% 😢? @islamaliev

EDIT: I noticed most of ci coverage test actions failing and therefore only some reports were submitted, maybe thats why.

AndrewSisley

I've only reviewed the first few files and need to go and eat :)

Looks good so far, just documentation requests.

internal/connor/key.go

internal/connor/eq.go

internal/connor/connor.go

internal/db/fetcher/indexer.go

internal/db/collection.go

internal/db/fetcher/wrapper.go

…check-compound-filter-condition

AndrewSisley

Is looking good, I'm continuing my review but it is taking a while so I thought I'd submit the outstanding comments now.

AndrewSisley · 2025-02-04T18:04:04Z

internal/connor/connor.go

+// It also takes a propExists boolean to indicate if the property exists in the data.
+// It's needed because the behavior of the operators can change if the property doesn't exist.
+// For example, _ne operator should return true if the property doesn't exist.
+// This can also be used in the future if we introduce operators line _has.


praise: This example is particularly useful - thanks Islam!

AndrewSisley · 2025-02-04T18:20:24Z

internal/db/fetcher/indexer.go

-			f.doc.MergeProperties(encDoc)
+	if f.indexDesc.Unique && !hasNilField {
+		f.currentDocID = immutable.Some(string(res.value))
+	} else {


question: This else block looks odd to me. Why are you uncertain of the type?

todo: The bytes else-if block is untested. Please either remove it, or test it.

AndrewSisley · 2025-02-04T18:26:25Z

internal/db/fetcher/indexer_iterators.go

-									return nil, nil
+		filter.TraverseProperties(
+			f.indexFilter.Conditions,
+			func(prop *mapper.PropertyIndex, condMap map[connor.FilterKey]any) bool {


todo: IMO this is way too large a function to declare inline, I struggled to find where it ended. Please make it a named function and add a line or so documenting what it is trying to do.

EDIT: I see that right at the very end of this very large inline function you mutate a variable (found) belonging to the encompassing scope. I think this needs rework to allow you name this function, perhaps you will need to rename TraverseProperties and return a boolean from it, I am not sure, but please change this as it is a bit hard to follow IMO (especially due to the mutation of found).

AndrewSisley · 2025-02-04T18:33:22Z

internal/db/fetcher/indexer_iterators.go

 							}
-							break jsonPathLoop
+							jsonPath = jsonPath.AppendProperty(prop.Name)
+							condMap = filterVal.(map[connor.FilterKey]any)


todo: Why is this cast safe (no ok check)? Please add your answer as a code comment.

AndrewSisley · 2025-02-04T18:45:23Z

internal/db/fetcher/indexer_matchers.go

+		var matcher valueMatcher
+		// we have a separate branch for null matcher because default matching behavior
+		// is what we need: for filter `_ne: null` it will match all non-null values
+		if v.IsNull() {


suggestion: This would be a lot more readable if you returned early instead of the current if-else-if-else nesting.

The current format forces the reader to read the entire function if they only care about a single if block, for example if I am debugging an issue with _, ok := v.Number(), I am forced to read through and check that matcher is not later overwritten or otherwise interacted with, whereas if it returned early I could instead leave this function and proceed further with my investigation.

for example:

if v.IsNull() { return &jsonNullMatcher{matchNull: condition.op == opEq} } if jsonVal, ok := v.Number(); ok { return &jsonComparingMatcher[float64]{ value: jsonVal, getValueFunc: func(j client.JSON) (float64, bool) { return j.Number() }, evalFunc: getCompareValsFunc[float64](condition.op), } } ... etc

AndrewSisley · 2025-02-04T18:48:54Z

internal/db/fetcher/indexer_matchers.go

+	}
+
+	var matcher valueMatcher
+	if v, ok := condition.val.Int(); ok {


suggestion: Same as above, I think this would be easier to read if you returned early instead of the large if-else

AndrewSisley

Looks good Islam! I'm nearly done reviewing but really need to eat :)

AndrewSisley · 2025-02-04T18:51:04Z

internal/db/fetcher/wrapper.go


-	// The below properties are only held in state in order to temporarily adhear to the [Fetcher]
+	// The below properties are only held in state in order to temporarily adhere to the [Fetcher]


AndrewSisley · 2025-02-04T18:54:25Z

internal/db/fetcher/wrapper.go

+		if err != nil {
+			return err
+		}
+		if indexFetcher != nil {


todo: Why would indexFetcher be nil here? You just called the constructor, and with it's current name (and documentation) it definitely should not be returning nil. Please remove this check, or rename and redocument newIndexFetcher.

EDIT: I've just seen the documentation below, that documentation should probably be incorporated in to the renamed newIndexFetcher func docs.

AndrewSisley · 2025-02-04T18:58:20Z

internal/db/fetcher/wrapper.go

+	}
+
+	// the index fetcher might not have been created if there is no efficient way to use fetch indexes
+	// with given filter conditions. In this case we fall back to the prefix fetcher


praise: This was a very useful comment thank you!

AndrewSisley · 2025-02-04T19:00:31Z

internal/planner/filter/traverse.go

+//	    }
+//	}
+//
+// The callback would receive path=["author", "books", "title"] and value="Sample"


praise: This is excellent documentation and really helped me understand the functions here, thanks Islam :)

AndrewSisley · 2025-02-04T19:04:42Z

internal/planner/filter/traverse.go

+//	}
+//
+// The callback would receive path=["author", "books", "title"] and value="Sample"
+func TraverseFields(conditions map[string]any, f func([]string, any) bool) {


todo: The return value of this function is unused, and unexplained by the documentation. Please remove it.

AndrewSisley · 2025-02-04T19:19:16Z

internal/planner/filter/traverse.go

+	switch t := value.(type) {
+	case map[string]any:
+		for k, v := range t {
+			if !isKeyOp(k) {


suggestion: There seems to be little benefit in inverting the if-else by negating isKeyOp(k), if you remove the ! it will reduce the cognitive load on the reader slightly.

AndrewSisley · 2025-02-04T19:21:31Z

internal/planner/filter/traverse.go

+				newPath := make([]string, len(path), len(path)+1)
+				copy(newPath, path)
+				newPath = append(newPath, k)
+				if !traverseFields(newPath, k, v, f) {


suggestion: k and v are short lived, but here, combined with the longer scope f the shortened variables read a little like algebra. I suggest renaming them to key and value.

Same goes for the other similar areas in this function.

AndrewSisley · 2025-02-04T19:24:32Z

internal/planner/scan.go

 	var f fetcher.Fetcher
 	if cid.HasValue() {
 		f = new(fetcher.VersionedFetcher)
 	} else {
 		f = fetcher.NewDocumentFetcher()

-		if index.HasValue() {


praise: I was worried you'd have to put this back when you moved the index prop to init in the fetcher. Thank you very much for cleaning this up.

AndrewSisley · 2025-02-04T19:26:58Z

internal/planner/select.go

 	}
-	return immutable.None[client.IndexDescription]()
+	slices.SortFunc(indexCandidates, func(a, b client.IndexDescription) int {
+		switch {


todo: This looks like it is a re-implementation of strings.Compare, please use strings.Compare instead, or document why you are not using it, and what the differences between this and that function are.

AndrewSisley · 2025-02-04T19:28:16Z

internal/planner/select.go

+		return true
+	})
+	if len(indexCandidates) == 0 {
+		return immutable.Option[client.IndexDescription]{}


suggestion: I think None is more descriptive than the default, which kind of looks like it has (or might have) a value.

AndrewSisley

Reviewed! Overall it looks really good Islam, the provided documentation was very useful when reading the code. I think all my requests are/were fairly localised, hopefully they all make sense to you :)

Thanks for resolving the issue.

AndrewSisley · 2025-02-04T19:56:47Z

internal/planner/type_join.go

+	}
+
+	// we store child's own filter in case an index kicks in and replaces it with it's own filter
+	join.subFilter = getScanNode(childSide.plan).filter


suggestion:

return invertibleTypeJoin{ docMapper: docMapper{parent.documentMapping}, parentSide: parentSide, childSide: childSide, skipChild: skipChild, // we store child's own filter in case an index kicks in and replaces it with it's own filter subFilter: getScanNode(childSide.plan).filter }

AndrewSisley · 2025-02-04T19:57:40Z

internal/planner/type_join.go

@@ -699,7 +718,7 @@ func (join *invertibleTypeJoin) Next() (bool, error) {
 	return true, nil
 }

-func (join *invertibleTypeJoin) nextJoinedSecondaryDoc() (bool, error) {
+func (join *invertibleTypeJoin) fetchRelatedSecondaryDocWithChildren(primaryDoc core.Doc) (bool, error) {


praise: Thanks for this name change

AndrewSisley · 2025-02-04T19:58:43Z

internal/planner/type_join.go

@@ -721,28 +742,33 @@ func (join *invertibleTypeJoin) nextJoinedSecondaryDoc() (bool, error) {
 		join.encounteredDocIDs = append(join.encounteredDocIDs, secondaryDocID)
 	}

-	hasDoc, err := fetchDocWithID(secondSide.plan, secondaryDocID)
+	//secondaryDocOpt, err := fetchDocWithID(secondSide.plan, secondaryDocID)


todo: Please remove the commented out code

AndrewSisley · 2025-02-04T20:01:48Z

tests/integration/index/array_test.go

@@ -167,7 +167,7 @@ func TestArrayIndex_WithFilterOnIndexedArrayUsingNone_ShouldUseIndex(t *testing.
 			},
 			testUtils.Request{
 				Request:  makeExplainQuery(req),
-				Asserter: testUtils.NewExplainAsserter().WithIndexFetches(9),
+				Asserter: testUtils.NewExplainAsserter().WithIndexFetches(0),


question: This is surprising, why has this changed? It looks like it is no longer using the index.

todo: If the change is correct, please update the name and/or document the test, as it doesnt make any sense to me atm.

AndrewSisley · 2025-02-04T20:04:35Z

tests/integration/index/json_test.go

 			},
 		},
 	}

 	testUtils.ExecuteTestCase(t, test)
 }
+
+func TestJSONIndex_WithNeFilterAgainstNonNullValue_ShouldFetchNullValues(t *testing.T) {
+	type testCase struct {


suggestion: I am really not a fan of bundling multiple tests into the same test. It makes debugging much harder and typically involves the repeated commenting and uncommenting of test cases in order to deal with one aspect at a time.

Please break this up.

AndrewSisley · 2025-02-04T20:05:33Z

tests/integration/index/query_with_index_combined_filter_test.go

@@ -48,7 +48,7 @@ func TestQueryWithIndex_IfIndexFilterWithRegular_ShouldFilter(t *testing.T) {
 			},
 			testUtils.Request{
 				Request:  makeExplainQuery(req),
-				Asserter: testUtils.NewExplainAsserter().WithFieldFetches(3).WithIndexFetches(3),
+				Asserter: testUtils.NewExplainAsserter().WithIndexFetches(3),


question: Why have the field fetches here been removed?

AndrewSisley · 2025-02-04T20:07:18Z

tests/integration/index/query_with_index_only_filter_test.go

@@ -49,7 +49,7 @@ func TestQueryWithIndex_WithNonIndexedFields_ShouldFetchAllOfThem(t *testing.T)
 			},
 			testUtils.Request{
 				Request:  makeExplainQuery(req),
-				Asserter: testUtils.NewExplainAsserter().WithFieldFetches(1).WithIndexFetches(1),
+				Asserter: testUtils.NewExplainAsserter().WithIndexFetches(1),


question: Regarding all the tests that had non-zero field fetches removed, why have the field fetches been removed?

AndrewSisley · 2025-02-04T20:07:59Z

tests/integration/index/query_with_index_only_filter_test.go

@@ -804,3 +804,213 @@ func TestQueryWithIndex_WithFilterOn2Relations_ShouldFilter(t *testing.T) {

 	testUtils.ExecuteTestCase(t, test)
 }
+
+func TestQueryWithIndex_WithNeFilterAgainstNonNilValue_ShouldFetchNilValues(t *testing.T) {
+	type testCase struct {


suggestion: Same as other similar comment, I have a preference for this to be broken up.

AndrewSisley · 2025-02-04T20:11:37Z

tests/integration/query/json/with_ne_test.go

@@ -163,3 +163,129 @@ func TestQueryJSON_WithNotEqualFilterWithNullValue_ShouldFilter(t *testing.T) {

 	testUtils.ExecuteTestCase(t, test)
 }
+
+func TestQueryJSON_WithNeFilterAgainstNonNullValue_ShouldFetchNullValues(t *testing.T) {
+	type testCase struct {


suggestion: Same suggestion as other comments, I have a preference for this to be broken up - it is multiple tests pretending to be a single test and this creates problems for anyone using it.

islamaliev and others added 30 commits January 2, 2025 10:08

Add json traversal functions

32c2440

Add GetPath method to JSON

53cb4cb

Include array element index in path

f703e3a

Fix json traversal

403c587

Add JSON and Bool encoding

1c059fb

Correctly handle paths to json nodes

c784f61

Base JSON index implementation

9855f30

Move match-related code to a file

0faa02b

Make index work for bool and string

92f7958

Add filter by json null value

516d290

Add MD file for secondary indexes

af5eba2

Add note about indexing of related docs

3dcb838

Add note about json indexing

e27d5db

Enable filtering by json bool and string

7e00694

Add unique json index

40341a0

Filter by array elements

61a7b90

Fix _in/_nin filter for json docs

32ef7bc

Add filtering on arrays of json docs

fc0eb2b

Remove filtering without array elements

70f8651

Add tests for composite index with json

cdb9d34

Enable indexing of array within json docs

adb71d4

Enable json array traversal to only top level elements

bb67d2f

Fix lint

8f24c04

Update docs

279bb69

Fix test expectations

343f5fc

Add change detector note

a56d3bf

Polish

b31c6c0

Update documentation

69a429b

Update documentation

74605db

Rename

efef1b1

islamaliev added 11 commits January 26, 2025 20:31

Make connor check if prop exists for _ne

bf2e68d

Make filter traverse validate some ops values

52b2451

Make index fetcher implement new interface

ff3c7b1

Adjust tests

7d1ddf0

Merge remote-tracking branch 'upstream/develop' into feat/make-index-…

cf86ebe

…check-compound-filter-condition

Merge remote-tracking branch 'upstream/develop' into feat/make-index-…

158f364

…check-compound-filter-condition

Fix issues

3e79a98

Add test issue

818d3fe

Adjust explain metrics

0f278b1

Polish

12abf5c

Fix lint

0394f13

islamaliev added area/query Related to the query component perf Performance issue or suggestion labels Jan 30, 2025

islamaliev added this to the DefraDB v0.16 milestone Jan 30, 2025

islamaliev self-assigned this Jan 30, 2025

islamaliev requested a review from a team January 30, 2025 16:26

islamaliev added 4 commits January 31, 2025 14:14

Adjust tests

dd307fd

Add note for change detector

e9ffb04

Adjust tests

ca30e90

Adjust tests

b33415c

AndrewSisley requested changes Feb 4, 2025

View reviewed changes

islamaliev added 2 commits February 4, 2025 11:52

PR fixup

a56180c

Merge remote-tracking branch 'upstream/develop' into feat/make-index-…

cdaf7c4

…check-compound-filter-condition

islamaliev requested a review from AndrewSisley February 4, 2025 10:53

Fix lint

91ac5a9

AndrewSisley requested changes Feb 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enable secondary index for compound filter conditions #3417

feat: Enable secondary index for compound filter conditions #3417

islamaliev commented Jan 30, 2025

codecov bot commented Jan 30, 2025 •

edited

Loading

shahzadlone commented Jan 30, 2025 •

edited

Loading

AndrewSisley left a comment

AndrewSisley left a comment

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025 •

edited

Loading

AndrewSisley Feb 4, 2025

AndrewSisley left a comment

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley left a comment

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025

AndrewSisley Feb 4, 2025


		// The below properties are only held in state in order to temporarily adhear to the [Fetcher]
		// The below properties are only held in state in order to temporarily adhere to the [Fetcher]

feat: Enable secondary index for compound filter conditions #3417

Are you sure you want to change the base?

feat: Enable secondary index for compound filter conditions #3417

Conversation

islamaliev commented Jan 30, 2025

Relevant issue(s)

Description

codecov bot commented Jan 30, 2025 • edited Loading

Codecov Report

shahzadlone commented Jan 30, 2025 • edited Loading

AndrewSisley left a comment

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 30, 2025 •

edited

Loading

shahzadlone commented Jan 30, 2025 •

edited

Loading

AndrewSisley Feb 4, 2025 •

edited

Loading