Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable secondary index for compound filter conditions #3417

Open
wants to merge 70 commits into
base: develop
Choose a base branch
from

Conversation

islamaliev
Copy link
Contributor

Relevant issue(s)

Resolves #3299

Description

Utilize secondary indexes even when compound filter conditions are present.
For this to work new filter traversing utility function is introduced that can be configured to different needs.

And not that indexes are exposed to more complex conditions they started to produce more false positive docs that weren't checked by the filter because the index fetcher was not part of the new fetcher chain.

Make index fetcher implement new fetcher interface so that the documents it fetches can be checked against the scanner filter and permissions.

Change behavior of connor to recognize if a field exists. It's need to distinguish if _ne filter returns false because 2 values are different or becuase the document doesn't have the field.

Make fieldFetched explain metric count all fields fetched, not only fields that were requested.

@islamaliev islamaliev added area/query Related to the query component perf Performance issue or suggestion labels Jan 30, 2025
@islamaliev islamaliev added this to the DefraDB v0.16 milestone Jan 30, 2025
@islamaliev islamaliev self-assigned this Jan 30, 2025
@islamaliev islamaliev requested a review from a team January 30, 2025 16:26
Copy link

codecov bot commented Jan 30, 2025

Codecov Report

Attention: Patch coverage is 92.43697% with 36 lines in your changes missing coverage. Please review.

Project coverage is 78.27%. Comparing base (a2b8971) to head (91ac5a9).

Files with missing lines Patch % Lines
internal/db/fetcher/indexer.go 80.65% 8 Missing and 4 partials ⚠️
internal/db/fetcher/indexer_matchers.go 90.28% 5 Missing and 2 partials ⚠️
internal/db/fetcher/wrapper.go 82.86% 4 Missing and 2 partials ⚠️
internal/db/fetcher/document.go 40.00% 2 Missing and 1 partial ⚠️
internal/db/fetcher/indexer_iterators.go 96.81% 2 Missing and 1 partial ⚠️
internal/planner/type_join.go 97.22% 2 Missing ⚠️
internal/connor/and.go 0.00% 0 Missing and 1 partial ⚠️
internal/connor/in.go 0.00% 0 Missing and 1 partial ⚠️
internal/connor/or.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3417      +/-   ##
===========================================
- Coverage    78.28%   78.27%   -0.01%     
===========================================
  Files          392      393       +1     
  Lines        36045    36106      +61     
===========================================
+ Hits         28217    28260      +43     
- Misses        6163     6185      +22     
+ Partials      1665     1661       -4     
Flag Coverage Δ
all-tests 78.27% <92.44%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/connor/all.go 83.87% <100.00%> (ø)
internal/connor/any.go 90.32% <100.00%> (ø)
internal/connor/connor.go 100.00% <100.00%> (ø)
internal/connor/eq.go 92.31% <100.00%> (ø)
internal/connor/ne.go 72.73% <100.00%> (+15.58%) ⬆️
internal/connor/none.go 83.87% <100.00%> (ø)
internal/connor/not.go 100.00% <100.00%> (ø)
internal/db/collection_get.go 79.66% <100.00%> (-4.82%) ⬇️
internal/db/collection_index.go 87.53% <100.00%> (+0.03%) ⬆️
internal/db/fetcher/fetcher.go 100.00% <ø> (ø)
... and 17 more

... and 19 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2b8971...91ac5a9. Read the comment docs.

@shahzadlone
Copy link
Member

shahzadlone commented Jan 30, 2025

question: Do you know why the coverage takes a hit of -23% 😢? @islamaliev

EDIT: I noticed most of ci coverage test actions failing and therefore only some reports were submitted, maybe thats why.

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only reviewed the first few files and need to go and eat :)

Looks good so far, just documentation requests.

internal/connor/key.go Outdated Show resolved Hide resolved
internal/connor/eq.go Show resolved Hide resolved
internal/connor/connor.go Show resolved Hide resolved
internal/db/fetcher/indexer.go Outdated Show resolved Hide resolved
internal/db/collection.go Outdated Show resolved Hide resolved
internal/db/fetcher/wrapper.go Outdated Show resolved Hide resolved
internal/db/fetcher/wrapper.go Show resolved Hide resolved
Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is looking good, I'm continuing my review but it is taking a while so I thought I'd submit the outstanding comments now.

// It also takes a propExists boolean to indicate if the property exists in the data.
// It's needed because the behavior of the operators can change if the property doesn't exist.
// For example, _ne operator should return true if the property doesn't exist.
// This can also be used in the future if we introduce operators line _has.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: This example is particularly useful - thanks Islam!

f.doc.MergeProperties(encDoc)
if f.indexDesc.Unique && !hasNilField {
f.currentDocID = immutable.Some(string(res.value))
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This else block looks odd to me. Why are you uncertain of the type?

todo: The bytes else-if block is untested. Please either remove it, or test it.

return nil, nil
filter.TraverseProperties(
f.indexFilter.Conditions,
func(prop *mapper.PropertyIndex, condMap map[connor.FilterKey]any) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: IMO this is way too large a function to declare inline, I struggled to find where it ended. Please make it a named function and add a line or so documenting what it is trying to do.

EDIT: I see that right at the very end of this very large inline function you mutate a variable (found) belonging to the encompassing scope. I think this needs rework to allow you name this function, perhaps you will need to rename TraverseProperties and return a boolean from it, I am not sure, but please change this as it is a bit hard to follow IMO (especially due to the mutation of found).

}
break jsonPathLoop
jsonPath = jsonPath.AppendProperty(prop.Name)
condMap = filterVal.(map[connor.FilterKey]any)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Why is this cast safe (no ok check)? Please add your answer as a code comment.

var matcher valueMatcher
// we have a separate branch for null matcher because default matching behavior
// is what we need: for filter `_ne: null` it will match all non-null values
if v.IsNull() {
Copy link
Contributor

@AndrewSisley AndrewSisley Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This would be a lot more readable if you returned early instead of the current if-else-if-else nesting.

The current format forces the reader to read the entire function if they only care about a single if block, for example if I am debugging an issue with _, ok := v.Number(), I am forced to read through and check that matcher is not later overwritten or otherwise interacted with, whereas if it returned early I could instead leave this function and proceed further with my investigation.

for example:

if v.IsNull() {
    return &jsonNullMatcher{matchNull: condition.op == opEq}
}

if jsonVal, ok := v.Number(); ok {
    return &jsonComparingMatcher[float64]{
					value:        jsonVal,
					getValueFunc: func(j client.JSON) (float64, bool) { return j.Number() },
					evalFunc:     getCompareValsFunc[float64](condition.op),
				}
}

... etc

}

var matcher valueMatcher
if v, ok := condition.val.Int(); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Same as above, I think this would be easier to read if you returned early instead of the large if-else

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Islam! I'm nearly done reviewing but really need to eat :)


// The below properties are only held in state in order to temporarily adhear to the [Fetcher]
// The below properties are only held in state in order to temporarily adhere to the [Fetcher]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) thanks

if err != nil {
return err
}
if indexFetcher != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Why would indexFetcher be nil here? You just called the constructor, and with it's current name (and documentation) it definitely should not be returning nil. Please remove this check, or rename and redocument newIndexFetcher.

EDIT: I've just seen the documentation below, that documentation should probably be incorporated in to the renamed newIndexFetcher func docs.

}

// the index fetcher might not have been created if there is no efficient way to use fetch indexes
// with given filter conditions. In this case we fall back to the prefix fetcher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: This was a very useful comment thank you!

// }
// }
//
// The callback would receive path=["author", "books", "title"] and value="Sample"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: This is excellent documentation and really helped me understand the functions here, thanks Islam :)

// }
//
// The callback would receive path=["author", "books", "title"] and value="Sample"
func TraverseFields(conditions map[string]any, f func([]string, any) bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: The return value of this function is unused, and unexplained by the documentation. Please remove it.

switch t := value.(type) {
case map[string]any:
for k, v := range t {
if !isKeyOp(k) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: There seems to be little benefit in inverting the if-else by negating isKeyOp(k), if you remove the ! it will reduce the cognitive load on the reader slightly.

newPath := make([]string, len(path), len(path)+1)
copy(newPath, path)
newPath = append(newPath, k)
if !traverseFields(newPath, k, v, f) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: k and v are short lived, but here, combined with the longer scope f the shortened variables read a little like algebra. I suggest renaming them to key and value.

Same goes for the other similar areas in this function.

var f fetcher.Fetcher
if cid.HasValue() {
f = new(fetcher.VersionedFetcher)
} else {
f = fetcher.NewDocumentFetcher()

if index.HasValue() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: I was worried you'd have to put this back when you moved the index prop to init in the fetcher. Thank you very much for cleaning this up.

}
return immutable.None[client.IndexDescription]()
slices.SortFunc(indexCandidates, func(a, b client.IndexDescription) int {
switch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: This looks like it is a re-implementation of strings.Compare, please use strings.Compare instead, or document why you are not using it, and what the differences between this and that function are.

return true
})
if len(indexCandidates) == 0 {
return immutable.Option[client.IndexDescription]{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think None is more descriptive than the default, which kind of looks like it has (or might have) a value.

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed! Overall it looks really good Islam, the provided documentation was very useful when reading the code. I think all my requests are/were fairly localised, hopefully they all make sense to you :)

Thanks for resolving the issue.

}

// we store child's own filter in case an index kicks in and replaces it with it's own filter
join.subFilter = getScanNode(childSide.plan).filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion:

return invertibleTypeJoin{
		docMapper:  docMapper{parent.documentMapping},
		parentSide: parentSide,
		childSide:  childSide,
		skipChild:  skipChild,
		// we store child's own filter in case an index kicks in and replaces it with it's own filter
		subFilter: getScanNode(childSide.plan).filter
	}

@@ -699,7 +718,7 @@ func (join *invertibleTypeJoin) Next() (bool, error) {
return true, nil
}

func (join *invertibleTypeJoin) nextJoinedSecondaryDoc() (bool, error) {
func (join *invertibleTypeJoin) fetchRelatedSecondaryDocWithChildren(primaryDoc core.Doc) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Thanks for this name change

@@ -721,28 +742,33 @@ func (join *invertibleTypeJoin) nextJoinedSecondaryDoc() (bool, error) {
join.encounteredDocIDs = append(join.encounteredDocIDs, secondaryDocID)
}

hasDoc, err := fetchDocWithID(secondSide.plan, secondaryDocID)
//secondaryDocOpt, err := fetchDocWithID(secondSide.plan, secondaryDocID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Please remove the commented out code

@@ -167,7 +167,7 @@ func TestArrayIndex_WithFilterOnIndexedArrayUsingNone_ShouldUseIndex(t *testing.
},
testUtils.Request{
Request: makeExplainQuery(req),
Asserter: testUtils.NewExplainAsserter().WithIndexFetches(9),
Asserter: testUtils.NewExplainAsserter().WithIndexFetches(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This is surprising, why has this changed? It looks like it is no longer using the index.

todo: If the change is correct, please update the name and/or document the test, as it doesnt make any sense to me atm.

},
},
}

testUtils.ExecuteTestCase(t, test)
}

func TestJSONIndex_WithNeFilterAgainstNonNullValue_ShouldFetchNullValues(t *testing.T) {
type testCase struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I am really not a fan of bundling multiple tests into the same test. It makes debugging much harder and typically involves the repeated commenting and uncommenting of test cases in order to deal with one aspect at a time.

Please break this up.

@@ -48,7 +48,7 @@ func TestQueryWithIndex_IfIndexFilterWithRegular_ShouldFilter(t *testing.T) {
},
testUtils.Request{
Request: makeExplainQuery(req),
Asserter: testUtils.NewExplainAsserter().WithFieldFetches(3).WithIndexFetches(3),
Asserter: testUtils.NewExplainAsserter().WithIndexFetches(3),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why have the field fetches here been removed?

@@ -49,7 +49,7 @@ func TestQueryWithIndex_WithNonIndexedFields_ShouldFetchAllOfThem(t *testing.T)
},
testUtils.Request{
Request: makeExplainQuery(req),
Asserter: testUtils.NewExplainAsserter().WithFieldFetches(1).WithIndexFetches(1),
Asserter: testUtils.NewExplainAsserter().WithIndexFetches(1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Regarding all the tests that had non-zero field fetches removed, why have the field fetches been removed?

@@ -804,3 +804,213 @@ func TestQueryWithIndex_WithFilterOn2Relations_ShouldFilter(t *testing.T) {

testUtils.ExecuteTestCase(t, test)
}

func TestQueryWithIndex_WithNeFilterAgainstNonNilValue_ShouldFetchNilValues(t *testing.T) {
type testCase struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Same as other similar comment, I have a preference for this to be broken up.

@@ -163,3 +163,129 @@ func TestQueryJSON_WithNotEqualFilterWithNullValue_ShouldFilter(t *testing.T) {

testUtils.ExecuteTestCase(t, test)
}

func TestQueryJSON_WithNeFilterAgainstNonNullValue_ShouldFetchNullValues(t *testing.T) {
type testCase struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Same suggestion as other comments, I have a preference for this to be broken up - it is multiple tests pretending to be a single test and this creates problems for anyone using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/query Related to the query component perf Performance issue or suggestion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sec. index: make compound filters utillize sec. indexes
3 participants