-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Ability to explain an executed request #1188
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
d67024c
to
8cd844e
Compare
Codecov Report
@@ Coverage Diff @@
## develop #1188 +/- ##
===========================================
- Coverage 70.17% 70.13% -0.05%
===========================================
Files 184 184
Lines 17392 17700 +308
===========================================
+ Hits 12205 12414 +209
- Misses 4251 4340 +89
- Partials 936 946 +10
|
IMO we can add that/those in later if we want, no need to complicate things straight off |
02eaf3c
to
b5f8ed1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going into this task I had a different structure in mind, but after looking at the concrete implementation, much like Andy had mentioned, there is some unknown issues lurking in other designs that wouldve been trying to hard to be smart and fancy about collecting runtime metrics.
I will say that the goal of doing #970 (metrics) first, was to use them here, instead of just raw "counters" as youre doing now. You mentioned in another comment that you didnt want the otel
stuff leaking around, but the goal of the metrics lib is to act as the abstracted interface between the underlying metrics provider/collector, and the metrics types (guages, counters, histograms, etc). As this would be a much more powerful primitive to use as you could do more interesting collection of metrics beyond a simple increment.
However, going through the various metrics packages, there are some issues in trying to track metrics on a "per request" basis, rather than the traditional app-wide aggregates. So will likely need to look further into that. There is likely some combination of tracing and metrics that will be ideal, as tracing is great for per request tracking of info.
Theres prob more to be said, but a lot of it is useless until we come up with a more long-term design that protects the prod code a bit more. But I recognize the difficulties here.
At the moment, I think this is a good enough solution, short of the noted todos
and suggestions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: The execution explain should technically also include the info from the simple explain as well as the runtime execution metrics as well.
So you wouldnt need to run them both seperately. This can technically be left as is, and we can just run two, but feels nicer to have Execute include both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that and I was thinking to have that as an option either under the verbose: false
to hide it if someone doesn't want to see the simple attributes, or a separate flag maybe like showSimpleAttributes: true
to show/hide them. I did not want to group the somewhat "static" simple attributes with the execution datapoints. Whichever approach we take, will be outside the scope of this PR
36af49c
to
ad1f99c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: I appreciated the clean commits! I found it made it easier to review.
suggestion: You do seem to be doing what I used to do when I first started using git like this though - some of the commits are too small for reviewers - for example by splitting the new tests from the commit that introduces the feature you create a disconnect that reduces the context that the reviewer has. It can make like easier for sure whilst developing, but in the future I would suggest squashing them before opening the PR so that any individual commit will contain both the production code changes and the tests for that change.
Overall the change looks good - my brain is getting a bit foggy though, so I'll have another look in the morning (I covered the core code, but mostly skimmed a bit over the node-specific metrics and the tests today).
@@ -100,6 +109,17 @@ func (n *averageNode) SetPlan(p planNode) { n.plan = p } | |||
|
|||
// Explain method returns a map containing all attributes of this node that | |||
// are to be explained, subscribes / opts-in this node to be an explainablePlanNode. | |||
func (n *averageNode) Explain() (map[string]any, error) { | |||
return map[string]any{}, nil | |||
func (n *averageNode) Explain(explainType request.ExplainType) (map[string]any, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: It feels a bit wasteful both in terms of execution, and maintenance to have Explain take explainType
as a param, and then perform the same switch for each node. It might be nicer to instead have two (parameterless) functions on planNode ExplainSimple()
and ExplainExecute()
and just do the switch once in explain.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a good point, I think I went this way because I didn't want to add another function to this interface:
type explainablePlanNode interface {
planNode
Explain(explainType request.ExplainType) (map[string]any, error)
}
Which would then result in another public function on all the explainable nodes:
averageNode
countNode
createNode
dagScanNode
deleteNode
groupNode
limitNode
orderNode
scanNode
selectNode
selectTopNode
sumNode
topLevelNode
typeIndexJoin
updateNode
How strong of a preference do you have for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing too strong as it doesnt really affect anything besides the explain code - I just think it would be slightly nicer. If you prefer it as is, for sure keep it as is - you'll almost certainly be working more than me with it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the potential upside of making it 2 exlicit methods instead of one, is controlling which nodes actually need a "custom" explain implementation, and which can kinda just skate by with either no explain, or a basic wrappedExplain
which just tracks some basic info that is common across all nodes (like #of invocations, results, etc..).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good points, will keep in mind to split these outside this PR when implementing debug
or predict
explain, keeping as is for now, unless someone has a really strong preference.
EDIT: I had a merge conflict that was due to the name change of plan to planNode. The extra line diff you see is due to that resolve.
to the `planner/explain.go` file.
24d25ae
to
491c86a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm late to reviewing this PR as I had my own struggles this week with the document delete stuff but I just took the time to go through it and it's nice work Shahzad! Andy has already approved and John will certainly do so shortly so I don't feel like I need to add mine.
I like that the testing covers a wide range of possibilities and that it includes a more complex set of schemas. I also appreciate the effort to get this through. 🤘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks amazing! 2 minuscule suggestions. Approving!
Really great job on all the tests, and going through my various refactors 👍
491c86a
to
5d8eb56
Compare
tested by running all queries (similar) from |
- Resolves #326 - Description: Adds ability to return datapoints / information gathered at every planner step. The information is stored during execution, and gathered post execution. - Usage: Add `@explain(type: execute) ` after the `query` or `mutation` operation. - Execute explain request for `query` operation - example: ``` query @Explain(type: execute) { Address(groupBy: [country]) { country _group { city } } } ``` - Execute explain request for `mutation` operation - example: ``` mutation @Explain(type: execute) { update_address( ids: ["bae-c8448e47-6cd1-571f-90bd-364acb80da7b"], data: "{\"country\": \"USA\"}" ) { country city } } ```
- Resolves sourcenetwork#326 - Description: Adds ability to return datapoints / information gathered at every planner step. The information is stored during execution, and gathered post execution. - Usage: Add `@explain(type: execute) ` after the `query` or `mutation` operation. - Execute explain request for `query` operation - example: ``` query @Explain(type: execute) { Address(groupBy: [country]) { country _group { city } } } ``` - Execute explain request for `mutation` operation - example: ``` mutation @Explain(type: execute) { update_address( ids: ["bae-c8448e47-6cd1-571f-90bd-364acb80da7b"], data: "{\"country\": \"USA\"}" ) { country city } } ```
Relevant issue(s)
Resolves #326
Description
Adds ability to return datapoints / information gathered at every planner step. The information is stored during execution, and gathered post execution.
Usage
Add
@explain(type: execute)
after thequery
ormutation
operation.query
operation - example:mutation
operation - example:For Reviewers
Note
TotalElapsedTime
datapoint for the request, however, it will need to wait until the explain testing framework is integrated properly (Integrate the new explain test setup into the new test action system #1243) and we can control how we want to test the varying time datapoint.typeIndexJoin
, as most chunk of the join execution stuff happens undertypeJoinMany
andtypeJoinOne
. In the future perhaps make them explainable nodes (to avoid the hacky explaining like we did forsimple
explain).Explain()
interface function into separateSimpleExplain()
andExecuteExplain()
functions.Need Feedback:
Wondering if we should have aResolved: if needed can be added in a later PR.verbose = false/true
option to add ability to hide some results like the actual document results?Tasks
How has this been tested?
CI running the integration tests.
Specify the platform(s) on which this was tested: