-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support querying nested fields in KQL #44554
Comments
Pinging @elastic/kibana-app |
IMHO - You should infer nested automatically based on the known properties of the fields, and use scoping in the query to define when and how you combine nested queries. Using your examples above: Now in terms of defining standard syntax, I would not use curly braces and instead use standard parens to define scope and infer from how the expression is defined within the parens to combine nested fields with common paths. Continuing with the example from above: To complete the example in my mind One other point, you cannot ever assume that the field path defined in an expression can be used to define the nested path(s) for the nested query. I could easily have a non nested object on level1 that resolves into a longer dotted path whose nested path is still just level1. Therefore the underlying parser must already have full knowledge about the properties and paths of the fields in question and can leverage that during parsing.
|
Thanks for the input @ppadovani, I think it's highly valuable since you've worked on this problem in your own plugin. I have mixed feelings about inferring nested automatically. It's certainly doable and it would save the user a little typing, but I'm afraid it could cause confusion. Say you start with the following query:
We're automatically adding the nested query for the user. They may not even know this is a nested field. Now they expand their query, wanting to match multiple parts of individual nested documents. The natural thing to do is this:
We're automatically wrapping each side of the I liken this to ==================================================== On the topic of using regular parens, I considered it before but I think it adds some troublesome ambiguity to the language. Take the following query:
Today, the parens override precedence which creates an ES query like this:
If we make parens indicate nesting groups as well, it won't be possible to override precedence without creating a nested group. In other words, it won't be possible to create the following query:
If we use the curly brace syntax I've proposed, we can generate the previous ES query with a KQL expression like this:
|
I was reviewing my notes and code last night in order to refresh my memory on this. You are correct that just using parens when they are already used for precedence would not allow for the flexibility that might be needed for more complex queries. I got around this problem by introducing the unary keyword EXISTS that must precede the precedence parens. The idea behind the keyword was that everything within the expression that follows the keyword must exist for it be evaluated to true. This would be the equivalent of using your curly braces. I do not believe the EXISTS unary would be the correct symbol to use, but the basic idea of using a preceding unary to indicate a 'grouping' or 'scope' might be an alternative way to think about this problem. The key would be to find a unary symbol that is somewhat intuitive for the problem at hand. Bottom line, I feel that introducing the curly braces may induce weariness around the use of braces and parens. I liken it to the language LISP which, if you ever tried to use it or took a class using it in college, quickly became synonymous with "Lots of Idiotic Stupid Parentheses". :-) |
haha, I like Lisp 😅 I take your meaning though. Something like this could be an option:
I found another problem with the current syntax I've proposed, which I'm trying to think through now. Say you have the following document, where
With the curly brace syntax there's no way to specify at which level you want to do the grouping. In other words I can't create the following ES query where I want to match a single document at level1 but I'm happy to match any document at level2:
Maybe the syntax should require you to specify the level at which to do the grouping. For example, to mimic the above ES query I could do something like
If I wanted to group at level2 I would do this:
I kind of like this because it makes the scope of the grouping really clear. It also sort of mirrors the shape of the actual nested objects which is nice. I thought I might be able to use regular parens in this case, but that would conflict with our existing |
This use case is why I like the unary approach. In the implementation of my plugin, the query would look like:
Let's assume for the moment the unary approach using a keyword is taken. In my mind what the query is doing here is a form of grouping/collecting query expressions by nested path. So what about COLLATE?
Let's say that level1 also had another field called 'drink', how would the unary approach look?
You grouping approach might look like:
I do like the second approach in terms of the feel of the presentation to the user, however it would require the user to know what the nested paths were defined as. If level2 had another non-nested object within which foo existed say 'fly', then the user would have to know to write the query like:
Whereas a unary approach the user doesn't have to know/remember the nested path. This would look like:
Your second approach is more compact but requires the user to understand the underlying structure and storage method of the data, where the more verbose unary approach requires them to just know the basic document structure. Additionally, I believe your second approach will require the parser to fully validate the outer 'level1' (path) against the internal terms adding complexity. How would type ahead work with your second approach? Here is another query to chew on:
This would be a valid query, as the outer nested 'level1' would be combined, but the inner 'level2' would still have its own inner nested query. I think the second approach might look like:
Finally, what if you wanted to query for documents where you didn't want to group the second level? Unary approach:
I think this might be the other approach?
I think there are pros and cons to both approaches, the key is choosing an approach with the fewest cons. ;-) |
I think the unary approach does still require the user to know which fields are nested though.
In this example the user has to know that
I'm not 100% sure I understand what you mean here. If you mean that the outer field will need to be combined with the inner field to get the full field name, that's correct but I believe we've already established a pattern in our parser that can be used to solve this. The typeahead code will definitely get more complex since it will need to know about the outer field if there is one, but it's certainly doable. The unary approach (or even the curly braces I originally proposed, which differ only in the special symbol being used) would probably be less complicated to implement and I do like that we'd continue to have the full field names everywhere. But the biggest thing I'm struggling with right now is that double |
Oops... I looked back at the double COLLATE, you only need one and I corrected my comment above. The reason you only need one, is that due to the internal knowledge of the nested field structure we can imply that there are two nested queries an outer one for path=level1 and an inner for path=level2. The user doesn't have to know that. What I meant about the path validation, was that the parser can't just blindly combine the paths, it has to actually have logic around validating that the outer path given is correct based on the inner query parameters. What will the parser do if the inner query contains A and B and only A is valid for the level1 path? |
@ppadovani hmmm, I assumed the double
The first one requires both query clauses to match a single level2 document. The second query only requires each query clause to match a single level1 document. The matches could come from multiple level2 documents. |
We'll have to throw an error in that situation. But there's a similar issue with the unary approach. Say A and B are both nested fields, and the user tries to put them both in the same |
Hmm... this is a good one to think through. So, the first example would generate something like:
whereas the second might look like:
So, thinking through the unary approach, I think you are correct and I am mistaken. My original assertion of: To satisfy the fist query a double COLLATE must be applied and used to infer depth and therefore requires the user to understand the underlying structure. I guess nothing comes for free? ;-) But even this isn't as intuitive as I would like: I guess you could possibly combine approaches? If we did combine approaches, wouldn't this still be valid, and one could infer that we are collating at level2 but would still generate the first nested query above? |
What if there are two
I don't think we can avoid requiring the full path |
PR #47070 |
… logic (#72490) ## Summary This PR is meant to update the exception builder logic to handle nested fields. If you're unfamiliar with nested fields, you can read up more on it [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) and [here](#44554). It also does a bit of cleanup, so though it may look like a lot of changes, parts of it were just moving some things around.
… logic (elastic#72490) ## Summary This PR is meant to update the exception builder logic to handle nested fields. If you're unfamiliar with nested fields, you can read up more on it [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) and [here](elastic#44554). It also does a bit of cleanup, so though it may look like a lot of changes, parts of it were just moving some things around.
… logic (elastic#72490) ## Summary This PR is meant to update the exception builder logic to handle nested fields. If you're unfamiliar with nested fields, you can read up more on it [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) and [here](elastic#44554). It also does a bit of cleanup, so though it may look like a lot of changes, parts of it were just moving some things around.
… logic (#72490) (#72982) ## Summary This PR is meant to update the exception builder logic to handle nested fields. If you're unfamiliar with nested fields, you can read up more on it [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) and [here](#44554). It also does a bit of cleanup, so though it may look like a lot of changes, parts of it were just moving some things around.
… logic (#72490) (#72983) ## Summary This PR is meant to update the exception builder logic to handle nested fields. If you're unfamiliar with nested fields, you can read up more on it [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) and [here](#44554). It also does a bit of cleanup, so though it may look like a lot of changes, parts of it were just moving some things around.
KQL doesn't currently support querying on nested fields. To support them properly we'll need to introduce some new syntax to the language. Nested fields can be queried in two different ways:
object
field works but nested fields can be queried in the same way. Although generally less useful, there are occasions where one might want to query a nested field in this way.The text was updated successfully, but these errors were encountered: