-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support database queries on arbitrary labels #117
Support database queries on arbitrary labels #117
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some inline comments, overall this seems to be going in a good direction, thanks for the contribution.
One other thing: can you please open the corresponding Steve PR as well, even in draft mode? I'd like to have a look at the whole picture, and possibly locally try it out as well. Thanks in advance! |
The steve PR is at rancher/steve#317 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this gets closer. I found two issues, one of which (adding an INDEX) should be very straightforward to solve.
The other will likely need changes in the parser in Steve as well as changes in the query builder here in listOptionsIndexer.
Thanks, keep up the good work!
912361f
to
2a2272b
Compare
This change has the new filter query expression parser, with tests. There's some cruft that came in from the k8s code that I'll pull out during the next round of reviews. |
Here are some sample filters commands I've been running:
This last one is interesting -- on the command-line, the square brackets have to be URL-hex-encoded. There are also unit tests that verify that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ericpromislow, I see this moving well towards the goal.
Most of my notes are nitpicks, there is just a couple of substantial ones.
More than anything, this needs some road testing by @richard-cox or somebody in the UI team - we want to make sure that whatever is built here will satisfy the needs of frontend code.
Keep up the good work!
case Eq: | ||
if filter.Partial { | ||
opString = "LIKE" | ||
escapeString = escapeBackslashDirective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about inlining the directive in the SQL query where it's used, instead of having this as a constant copied into a variable and then into the query?
(yes, I understand that's a bit of repetition, maybe it's just me suffering from indirection motion sickness!)
PS. same about matchFmt. I had a hard time looking into 4 places before coalescing the sense of the query in my mind while reviewing 🦀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBD
We need to hear from Richard if this business with quoting strings to be exact vs doing substring matches is going to survive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There needs to be a way to differentiate between partial and exact matches. As long as that's there am open to offers on syntax (UI changes would be straight forward)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richard-cox please note that the context for this thread is labels.
My understanding is that, for labels, you need power to match label selectors.
Now label selectors do implement equality, which we call exact for normal fields. They do not implement "substring equality".
Can you confirm we will continue to need both "exact" and "substring" equality for normal fields, and label selector power on label fields, which does not include "substring" equality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm we will continue to need both "exact" and "substring" equality for normal fields, and label selector power on label fields, which does not include "substring" equality?
@moio correct.
Label selectors are exact/not exact or in/not in a set. Filtering outside of labels would need to cover exact/not exact as well as substrings ... but not sets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify
- Labels
- exactly matches a string. does not match an exact string. partial matching on strings is not required
- value is or is not in a set of values
- https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement
- Re-reading above there are the two additional entries
partition
:selects all resources including a label with key partition; no values are checked.
- label exists regardless of value
!partition
:selects all resources without a label with key partition; no values are checked.
- label does not exist regardless of value
- Everything else
- exactly matches a string. partially matches a string. does not match an exact string (i think we currently also get does not partially match a string, but i don't think this is used anywhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericpromislow please keep in mind the UI needs to evaluate results of a selector from a Kubernetes resource "as it is" (eg. taking a .spec.selector out of a Deployment, pass it to Steve to get a list of targeted pods in order to display a preview - If the user interactively changes the selector, update the preview on-the-fly).
Therefore there should be a way to pass in a plain Kubernetes selector as a filter with the exact same syntax and get the exact same semantics in return.
It is my understanding from Steve code that we are now parsing filter=
query parameters with a syntax similar to vanilla Kubernetes selectors, with an extension for quoted strings (which are interpreted as exact matches, while the default is substring matching). Note that is different from standard Kubernetes selectors, where matching is exact by default (and substring matching does not even exist).
If the above is right (please correct me otherwise!) you might want to consider adding a hard distinction between filter=
parameters - to be kept working exactly as today, and a new kind of parameters - say, selectorFilter=
- that recognizes and implements Kubernetes selector syntax and semantics 1:1.
Please let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct that if the target of a =
or !=
operator is quoted, we do an exact match. Otherwise we do a substring match. This is an extension of the standard kubectl label selector parser, which is why I forked and modified it.
I'd rather not make any more changes until Richard is able to integrate these changes and see if they work as expected from the UI. I don't know if the UI in 2.10 releases is able to issue these kinds of filters -- I haven't seen any evidence of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a UI perspective splitting these into two non-overlapping filters would work. One covering the existing filter
with our own syntax and another covering the kube label select (and standard kube filtering?) api syntax. It feels like the latter is a more neater solution, though i haven't looked through the changes in this PR.
The UI effort to cover either approach (single uber filter or two filters) is small.
Eric you're right, there's currently no way to exercise the label selector via api in the ui. Given an image with your changes in (that supports amd/x86) i could implement the feature in a single area and knock up a build that can very easily be used (by changing two rancher management.cattle.io setting
values)
9d24e20
to
2e39614
Compare
* Add labels when adding/replacing objects. * Add labels to the query language
e1c53ba
to
b032b9d
Compare
…escriptive named constants. Also: Move all the label operations from store to listoption_indexer.
…bles. - tx.Exec takes only one argument.
In particular, don't clear the count query values if no count query needs to be made -- just leave the default struct values, and the query executor won't run a count-query.
b032b9d
to
8501428
Compare
2f0f76a
to
15adcaa
Compare
Conceptually it's simpler to do a '<key> NOT IN (SELECT o1.key...)' to find the rows that don't have the target label.
For example, if the test is to find all rows where `metadata.labels.animal = "cows"`, we can ignore any rows that don't have associated labels, because they will never match.
Two comments on the
As for performance on large sets, I assume that sqlite is smart enough to convert the inner query into a stream of rows that the outer query processes in a separate coroutine or thread. I've never read the source, but I've done tests (with Python code) and it was much faster to give sqlite a 12-line query to process than to give it a conceptually simpler query and have the Python code do further filtering on it. |
Biking home (in the rain) I realized the current solution still falls short for negative tests on labels in (at least) three ways:
|
There are negative tests on non-label fields -- the goal here is to verify that we never do an OUTER JOIN when we don't need one.
Nah, no worries. That's what reviews are for - that is, when they do work! 😇 glad it's fixed now
Ack!
Hmm, I see duplicates though. Taking this discussion to the review to keep it in one place.
Absolutely, we should default to defer complexity to the SQLite query planner as much as possible. We should complicate queries only if absolutely needed, and I have no reasons to believe this is a concern for now. Thanks! |
- Always LEFT-OUTER-JOIN to retain fields that have no associated labels (for negative operators) - Always SELECT-DISTINCT in case some results return duplicate entries. And always SELECT DISTINCT when
Tentative 👍 from my side - let's hear @richard-cox's comments testing this. I'd also suggest a second look from a Frameworks colleague - @tomleb maybe? |
Had a chance to look at the image / feature today. Comment WIP, updating.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned on Slack, there's an issue with AND'ing multiple label filters together:
I created the following configmaps in the namespace foo:
$ k -n foo get configmap --show-labels
NAME DATA AGE LABELS
a 0 3m2s aaa=bb,bbb=dd
b 0 3m1s bbb=dd
c 0 3m aaa=bb
Works fine with one filter:
$ curl -s -g -k -H Accept:application/json 'https://localhost:9443/v1/configmaps/foo?filter=metadata.labels[aaa]+IN+(bb)' | jq '.co
unt'
2
But not with two:
$ curl -s -g -k -H Accept:application/json 'https://localhost:9443/v1/configmaps/foo?filter=metadata.labels[aaa]+IN+(bb)&filter=metadata.labels[bbb]+IN+(dd)' | jq
{
"type": "collection",
"links": {
"self": "https://localhost:9443/v1/configmaps/foo"
},
"createTypes": {
"configmap": "https://localhost:9443/v1/configmaps"
},
"actions": {},
"resourceType": "configmap",
"data": []
}
Here's the SQL query for this filter (with params):
SELECT DISTINCT o.object, o.objectnonce, o.dekid FROM "_v1_ConfigMap" o
JOIN "_v1_ConfigMap_fields" f ON o.key = f.key
LEFT OUTER JOIN "_v1_ConfigMap_labels" lt ON o.key = lt.key
WHERE
(lt.label = 'aaa' AND lt.value IN ('bb')) AND
(lt.label = 'bbb' AND lt.value IN ('dd')) AND
(f."metadata.namespace" = 'foo')
ORDER BY f."metadata.namespace" ASC, f."metadata.name" ASC
LIMIT 100000
lt.label
cannot be equal to both aaa
and bbb
at the same time. Instead, I think we should do a new JOIN for every label filter:
SELECT DISTINCT o.object, o.objectnonce, o.dekid FROM "_v1_ConfigMap" o
JOIN "_v1_ConfigMap_fields" f ON o.key = f.key
LEFT OUTER JOIN "_v1_ConfigMap_labels" lt1 ON o.key = lt1.key
LEFT OUTER JOIN "_v1_ConfigMap_labels" lt2 ON o.key = lt2.key
WHERE
(lt1.label = 'aaa' AND lt1.value IN ('bb')) AND
(lt2.label = 'bbb' AND lt2.value IN ('dd')) AND
(f."metadata.namespace" = 'foo')
ORDER BY f."metadata.namespace" ASC, f."metadata.name" ASC
LIMIT 100000
When I do that, I get the expected result of foo/a
.
Before I do that I created three slightly different configmaps (I like to use readable words that are different enough):
Referring to kubernetes:
And with no changes to the parser and sql generator:
The generated sql was:
Now I'm confused... Did I get the semantics of I think so. Here's how the upstream code parses a group of comma-separated expressions:
And I'm mapping them to an OR-group. End of day here, but this will need correcting tomorrow. |
When we have multiple filter sub-queries, these get ANDed together. But we need to do a self-join for each instance of a label-type filter. Added unit tests to verify that this is what we're generating.
Tom was right (as always :). I'm now doing self-joins on label tables, even if there's only one instance of a label in the compound query (which simplifies the code generator), and live probing makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
I'd suggest to merge this unless @tomleb finds major objections so that @richard-cox has a chance to hammer it further in a build from main
, and treat anything else as a follow-up PR, as this has reached 100 comments already!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've done some light testing on things that were failing before and they now work correctly.
Related to #46333
This PR needs to get merged first before I can submit the PR for Steve,
which caches the labels. If you prefer, I'll submit a PR with steve that
temporarily pulls in
../lasso
in the go.mod file.Note that this is the search syntax I've implemented:
If
LABELVALUE
is quoted, an exact match is made. Otherwise partial matching is done on the value,like for other
A=B
queries.Note that the exact
LABELNAME
must be specified aftermetadata.labels
, between the escaped square brackets.This is similar to how you always have to provide the full name of a field to the left of
=
.This PR supersedes the experimental PR #110