Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support database queries on arbitrary labels #117

Merged
merged 24 commits into from
Jan 16, 2025

Conversation

ericpromislow
Copy link
Contributor

Related to #46333

This PR needs to get merged first before I can submit the PR for Steve,
which caches the labels. If you prefer, I'll submit a PR with steve that
temporarily pulls in ../lasso in the go.mod file.

Note that this is the search syntax I've implemented:

curl -sk https://HOSTANDPORT/v1/configmaps?filter=metadata.labels%5bLABELNAME%5d=LABELVALUE

If LABELVALUE is quoted, an exact match is made. Otherwise partial matching is done on the value,
like for other A=B queries.

Note that the exact LABELNAME must be specified after metadata.labels, between the escaped square brackets.
This is similar to how you always have to provide the full name of a field to the left of =.

This PR supersedes the experimental PR #110

@ericpromislow ericpromislow requested a review from a team as a code owner November 6, 2024 00:31
@ericpromislow ericpromislow requested review from moio and a team and removed request for a team November 6, 2024 00:31
Copy link
Contributor

@moio moio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some inline comments, overall this seems to be going in a good direction, thanks for the contribution.

pkg/cache/sql/store/store.go Outdated Show resolved Hide resolved
pkg/cache/sql/db/client.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
@moio
Copy link
Contributor

moio commented Nov 6, 2024

One other thing: can you please open the corresponding Steve PR as well, even in draft mode?

I'd like to have a look at the whole picture, and possibly locally try it out as well.

Thanks in advance!

@ericpromislow ericpromislow marked this pull request as draft November 6, 2024 19:27
@ericpromislow
Copy link
Contributor Author

The steve PR is at rancher/steve#317

@ericpromislow ericpromislow marked this pull request as ready for review November 7, 2024 01:18
Copy link
Contributor

@moio moio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this gets closer. I found two issues, one of which (adding an INDEX) should be very straightforward to solve.

The other will likely need changes in the parser in Steve as well as changes in the query builder here in listOptionsIndexer.

Thanks, keep up the good work!

pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
@ericpromislow
Copy link
Contributor Author

This change has the new filter query expression parser, with tests. There's some cruft that came in from the k8s code that I'll pull out during the next round of reviews.

@ericpromislow
Copy link
Contributor Author

ericpromislow commented Nov 15, 2024

Here are some sample filters commands I've been running:

/v1/events'?filter=involvedObject.kind=Pod'
/v1/events?filter=_type=some-event-type
/v1/events'?filter=metadata.labels.app=app1'
/v1/events'?filter=message="This+event+%234."'  # have to URL-encode number signs
/v1/events'?filter=metadata.labels%5bauthz.management.cattle.io/default-project%5d="true"'

This last one is interesting -- on the command-line, the square brackets have to be URL-hex-encoded.
Next the lexer considers square brackets to be alphanumerics, and the restricted label syntax means
that all the characters inside the brackets are alphanumeric as well. The code that converts k8s/apimachinery
Requirement objects into lasso/Informer objects knows what to do with metadata.labels[...], and you get
the SQL query you'd expect.

There are also unit tests that verify that filter=x=1,y=2 => select when ... x=1 OR y=2
and filter=x=1&filter=y=2 => select when ... x =1 & y = 2

Copy link
Contributor

@moio moio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericpromislow, I see this moving well towards the goal.

Most of my notes are nitpicks, there is just a couple of substantial ones.

More than anything, this needs some road testing by @richard-cox or somebody in the UI team - we want to make sure that whatever is built here will satisfy the needs of frontend code.

Keep up the good work!

pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
case Eq:
if filter.Partial {
opString = "LIKE"
escapeString = escapeBackslashDirective
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about inlining the directive in the SQL query where it's used, instead of having this as a constant copied into a variable and then into the query?

(yes, I understand that's a bit of repetition, maybe it's just me suffering from indirection motion sickness!)

PS. same about matchFmt. I had a hard time looking into 4 places before coalescing the sense of the query in my mind while reviewing 🦀

Copy link
Contributor Author

@ericpromislow ericpromislow Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBD

We need to hear from Richard if this business with quoting strings to be exact vs doing substring matches is going to survive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a way to differentiate between partial and exact matches. As long as that's there am open to offers on syntax (UI changes would be straight forward)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richard-cox please note that the context for this thread is labels.

My understanding is that, for labels, you need power to match label selectors.

Now label selectors do implement equality, which we call exact for normal fields. They do not implement "substring equality".

Can you confirm we will continue to need both "exact" and "substring" equality for normal fields, and label selector power on label fields, which does not include "substring" equality?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm we will continue to need both "exact" and "substring" equality for normal fields, and label selector power on label fields, which does not include "substring" equality?

@moio correct.

Label selectors are exact/not exact or in/not in a set. Filtering outside of labels would need to cover exact/not exact as well as substrings ... but not sets

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericpromislow please keep in mind the UI needs to evaluate results of a selector from a Kubernetes resource "as it is" (eg. taking a .spec.selector out of a Deployment, pass it to Steve to get a list of targeted pods in order to display a preview - If the user interactively changes the selector, update the preview on-the-fly).

Therefore there should be a way to pass in a plain Kubernetes selector as a filter with the exact same syntax and get the exact same semantics in return.

It is my understanding from Steve code that we are now parsing filter= query parameters with a syntax similar to vanilla Kubernetes selectors, with an extension for quoted strings (which are interpreted as exact matches, while the default is substring matching). Note that is different from standard Kubernetes selectors, where matching is exact by default (and substring matching does not even exist).

If the above is right (please correct me otherwise!) you might want to consider adding a hard distinction between filter= parameters - to be kept working exactly as today, and a new kind of parameters - say, selectorFilter= - that recognizes and implements Kubernetes selector syntax and semantics 1:1.

Please let me know what you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct that if the target of a = or != operator is quoted, we do an exact match. Otherwise we do a substring match. This is an extension of the standard kubectl label selector parser, which is why I forked and modified it.

I'd rather not make any more changes until Richard is able to integrate these changes and see if they work as expected from the UI. I don't know if the UI in 2.10 releases is able to issue these kinds of filters -- I haven't seen any evidence of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a UI perspective splitting these into two non-overlapping filters would work. One covering the existing filter with our own syntax and another covering the kube label select (and standard kube filtering?) api syntax. It feels like the latter is a more neater solution, though i haven't looked through the changes in this PR.

The UI effort to cover either approach (single uber filter or two filters) is small.

Eric you're right, there's currently no way to exercise the label selector via api in the ui. Given an image with your changes in (that supports amd/x86) i could implement the feature in a single area and knock up a build that can very easily be used (by changing two rancher management.cattle.io setting values)

pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
* Add labels when adding/replacing objects.
* Add labels to the query language
@ericpromislow ericpromislow changed the base branch from master to main December 3, 2024 21:23
In particular, don't clear the count query values if no
count query needs to be made -- just leave the default struct
values, and the query executor won't run a count-query.
Conceptually it's simpler to do a '<key> NOT IN (SELECT o1.key...)' to find the rows that don't have the target label.
For example, if the test is to find all rows where `metadata.labels.animal = "cows"`,
we can ignore any rows that don't have associated labels, because they will never match.
@ericpromislow
Copy link
Contributor Author

Two comments on the != issue:

  1. My mistake completely on the submitted LABEL NOT-EXISTS solution. It simply doesn't work. I've done live tests with a sqlite DB with the new form, and the results look correct. It also conceptually makes more sense.

  2. Yes, when you're doing a NOT-EXISTS or != test with labels, you do need to do a LEFT OUTER JOIN with the main field rows on the left and the labels table on the right. The LEFT OUTER JOIN is only needed when doing these two comparisons -- in all other cases, if a particular key has no associated labels, there's no need to include it in the table we're doing processing on.

  3. After all that, I didn't see a need for a DISTINCT qualifier. I'm now doing a SELECT key FROM x JOIN x_labels WHEN key NOT IN (SELECT o2.key FROM x LEFT OUTER o2 JOIN x_labels l2 WHEN l2.label = NAME (AND l2.value = VALUE))
    and the inner query never returns duplicates.

As for performance on large sets, I assume that sqlite is smart enough to convert the inner query into a stream of rows that the outer query processes in a separate coroutine or thread. I've never read the source, but I've done tests (with Python code) and it was much faster to give sqlite a 12-line query to process than to give it a conceptually simpler query and have the Python code do further filtering on it.

@ericpromislow
Copy link
Contributor Author

Biking home (in the rain) I realized the current solution still falls short for negative tests on labels in (at least) three ways:

  1. We need to do the non-existence test on label foo for metadata.labels.foo NOTIN (...)

  2. hasNegativeExistenceTest needs to also look for NOTIN operator on labels

  3. hasNegativeExistenceTest should also verify that the !=, NOT-EXISTS and NOT-IN operators are being used with a metadata.labels.X subject. Otherwise there's no need to do an OUTER JOIN here.

There are negative tests on non-label fields -- the goal here is
to verify that we never do an OUTER JOIN when we don't need one.
@moio
Copy link
Contributor

moio commented Dec 9, 2024

1. My mistake completely on the submitted `LABEL NOT-EXISTS` solution. It simply doesn't work. I've done live tests with a sqlite DB with the new form, and the results look correct. It also conceptually makes more sense.

Nah, no worries. That's what reviews are for - that is, when they do work! 😇 glad it's fixed now

2. Yes, when you're doing a NOT-EXISTS or != test with labels, you do need to do a `LEFT OUTER JOIN` with the main field rows on the left and the labels table on the right. The `LEFT OUTER JOIN` is only needed when doing these two comparisons -- in all other cases, if a particular key has no associated labels, there's no need to include it in the table we're doing processing on.

Ack!

3. After all that, I didn't see a need for a `DISTINCT` qualifier. I'm now doing a `SELECT key FROM x JOIN x_labels WHEN key NOT IN (SELECT o2.key FROM x LEFT OUTER o2 JOIN x_labels l2 WHEN l2.label = NAME (AND l2.value = VALUE))`
   and the inner query never returns duplicates.

Hmm, I see duplicates though. Taking this discussion to the review to keep it in one place.

As for performance on large sets, I assume that sqlite is smart enough to convert the inner query into a stream of rows that the outer query processes in a separate coroutine or thread. I've never read the source, but I've done tests (with Python code) and it was much faster to give sqlite a 12-line query to process than to give it a conceptually simpler query and have the Python code do further filtering on it.

Absolutely, we should default to defer complexity to the SQLite query planner as much as possible. We should complicate queries only if absolutely needed, and I have no reasons to believe this is a concern for now.

Thanks!

pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
pkg/cache/sql/informer/listoption_indexer.go Outdated Show resolved Hide resolved
- Always LEFT-OUTER-JOIN to retain fields that have no associated labels (for negative operators)
- Always SELECT-DISTINCT in case some results return duplicate entries.

And always SELECT DISTINCT when
@ericpromislow ericpromislow requested a review from moio December 9, 2024 21:19
@moio
Copy link
Contributor

moio commented Dec 10, 2024

Tentative 👍 from my side - let's hear @richard-cox's comments testing this.

I'd also suggest a second look from a Frameworks colleague - @tomleb maybe?

@tomleb tomleb requested review from tomleb and removed request for a team December 11, 2024 20:04
@richard-cox
Copy link
Member

richard-cox commented Dec 19, 2024

Had a chance to look at the image / feature today.

Comment WIP, updating.

  1. With no params provided it looks like some filtering is happening anyway on the services resource. I think this could be related to a bug with one of the indexed fields, probably resolved with SQLite backed cache: indexed fields round #4 steve#430
    a) image
    b) image
    c) image
  2. We may need a feature to flip reverse the resources
    a) Need to determine the services associated with a deployment
    b) currently this happens by fetching the pods associated with a deployment and then iterating over all services and testing the service's pod selector against the deployments pods
  3. Working on a first pass of integrating the new label selector filtering api. got this (dev) wired in to
    • Workload detail page - pods and services (broken due to reverse selector)
    • Service Detail - Pods List
    • Service create/edit - Pod selector tab
    • Will add one that exercises the matchExpressions side of labelSelector... and create a dev build

Copy link
Contributor

@tomleb tomleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on Slack, there's an issue with AND'ing multiple label filters together:

I created the following configmaps in the namespace foo:

$ k -n foo get configmap --show-labels
NAME               DATA   AGE     LABELS
a                  0      3m2s    aaa=bb,bbb=dd
b                  0      3m1s    bbb=dd
c                  0      3m      aaa=bb

Works fine with one filter:

$ curl -s -g -k -H Accept:application/json  'https://localhost:9443/v1/configmaps/foo?filter=metadata.labels[aaa]+IN+(bb)' | jq '.co
unt'
2

But not with two:

$ curl -s -g -k -H Accept:application/json  'https://localhost:9443/v1/configmaps/foo?filter=metadata.labels[aaa]+IN+(bb)&filter=metadata.labels[bbb]+IN+(dd)' | jq
{
  "type": "collection",
  "links": {
    "self": "https://localhost:9443/v1/configmaps/foo"
  },
  "createTypes": {
    "configmap": "https://localhost:9443/v1/configmaps"
  },
  "actions": {},
  "resourceType": "configmap",
  "data": []
}

Here's the SQL query for this filter (with params):

SELECT DISTINCT o.object, o.objectnonce, o.dekid FROM "_v1_ConfigMap" o
  JOIN "_v1_ConfigMap_fields" f ON o.key = f.key
  LEFT OUTER JOIN "_v1_ConfigMap_labels" lt ON o.key = lt.key
  WHERE
    (lt.label = 'aaa' AND lt.value IN ('bb')) AND
    (lt.label = 'bbb' AND lt.value IN ('dd')) AND
    (f."metadata.namespace" = 'foo')
  ORDER BY f."metadata.namespace" ASC, f."metadata.name" ASC 
  LIMIT 100000

lt.label cannot be equal to both aaa and bbb at the same time. Instead, I think we should do a new JOIN for every label filter:

SELECT DISTINCT o.object, o.objectnonce, o.dekid FROM "_v1_ConfigMap" o
  JOIN "_v1_ConfigMap_fields" f ON o.key = f.key
  LEFT OUTER JOIN "_v1_ConfigMap_labels" lt1 ON o.key = lt1.key
  LEFT OUTER JOIN "_v1_ConfigMap_labels" lt2 ON o.key = lt2.key
  WHERE
    (lt1.label = 'aaa' AND lt1.value IN ('bb')) AND
    (lt2.label = 'bbb' AND lt2.value IN ('dd')) AND
    (f."metadata.namespace" = 'foo')
  ORDER BY f."metadata.namespace" ASC, f."metadata.name" ASC 
  LIMIT 100000

When I do that, I get the expected result of foo/a.

@ericpromislow
Copy link
Contributor Author

@tom re #117 (review)

Before I do that I created three slightly different configmaps (I like to use readable words that are different enough):

configmap01.yaml:  labels:
configmap01.yaml-    cm01a: snail
configmap01.yaml-    cm01b: shade
--
configmap02.yaml:  labels:
configmap02.yaml-    cm02a: theta
configmap02.yaml-    cm02b: raspy
--
configmap03.yaml:  labels:
configmap03.yaml-    cm03e: froze
configmap03.yaml-    cm03f: lever

Referring to kubernetes:

$ k get configmap -l cm01a=snail,cm02b=raspy
No resources found in default namespace.  # I thought this is an 'OR', but it looks like an 'AND'

$ k get configmap -l cm01a=snail -l cm02b=raspy # Here the last one wins
NAME   DATA   AGE
cm02   1      15m 

$ k get configmap -l cm02b=raspy -l cm01a=snail # Last one wins again
NAME   DATA   AGE
cm01   2      17m

And with no changes to the parser and sql generator:

$ curl -skL 'https://localhost:5111/v1/configmaps?filter=metadata.labels.cm01a=snail,metadata.labels.cm02b=raspy' | jq .count
2

The generated sql was:

SELECT FROM ... "_v1_ConfigMap" o
  JOIN "_v1_ConfigMap_fields" f ON o.key = f.key
  LEFT OUTER JOIN "_v1_ConfigMap_labels" lt ON o.key = lt.key
  WHERE
    ((lt.label = "cm01a" AND lt.value LIKE "%snail1% ESCAPE '\') OR (lt.label = "cm02b" AND lt.value LIKE "%raspy%"))

Now I'm confused... Did I get the semantics of filter=F1&filter=F2....&filter= and filter=F1,F2,... reversed?

I think so. Here's how the upstream code parses a group of comma-separated expressions:

// Matches for a internalSelector returns true if all
// its Requirements match the input Labels. If any
// Requirement does not match, false is returned.
func (s internalSelector) Matches(l Labels) bool {
	for ix := range s {
		if matches := s[ix].Matches(l); !matches {
			return false
		}
	}
	return true
}

And I'm mapping them to an OR-group. End of day here, but this will need correcting tomorrow.

When we have multiple filter sub-queries, these get ANDed together.
But we need to do a self-join for each instance of a label-type filter.

Added unit tests to verify that this is what we're generating.
@ericpromislow
Copy link
Contributor Author

Tom was right (as always :). I'm now doing self-joins on label tables, even if there's only one instance of a label in the compound query (which simplifies the code generator), and live probing makes sense.

@ericpromislow ericpromislow requested a review from tomleb January 7, 2025 22:37
Copy link
Contributor

@moio moio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.

I'd suggest to merge this unless @tomleb finds major objections so that @richard-cox has a chance to hammer it further in a build from main, and treat anything else as a follow-up PR, as this has reached 100 comments already!

Copy link
Contributor

@tomleb tomleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've done some light testing on things that were failing before and they now work correctly.

@ericpromislow ericpromislow merged commit 9e2b687 into rancher:main Jan 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants