-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use DESCRIBE ANALYZE
/ EXPLAIN ANALYZE
to display stats data about joins.
#2248
Conversation
…rinted in an `Explain` output.
…e DescribeQuery node.
…node is iterated over.
… populate actual row counts.
…lan descriptions.
…s guarentees that the options are passed to their child nodes.
@@ -49,6 +49,62 @@ SELECT c_discount, c_last, c_credit, w_tax FROM customer2, warehouse2 WHERE w_id | |||
" ├─ name: customer2\n" + | |||
" └─ columns: [c_id c_d_id c_w_id c_last c_credit c_discount]\n" + | |||
"", | |||
ExpectedEstimates: "Project\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing I have opinions on is how much info EXPLAIN ANALYZE should include. Most of the time, fine grained details can be provided by a regular explain, and we want the information that helps us understand where queries are expensive/making expensive mistakes.
So for this query (picked this one just because it's been a pain in my ass), other DBs might do something like discard the project node, remove most relation and filter details, and probably give cardinality estimates for filters and table relations:
LookupJoin (estimated rows=0.000 rows=22500)
├─ IndexedTableAccess(warehouse2) (estimated rows=0.000 rows=1)
│ └─ index: [warehouse2.w_id]
└─ Filter (estimated rows=0.000 rows=22500)
├─ (non-debug expression string one-liner)
└─ IndexedTableAccess(customer2) (estimated rows=0.000 rows=100000)
└─ index: [customer2.c_w_id,customer2.c_d_id,customer2.c_id]
If the EXPLAIN ANALYZE flags a bad index or weird join order, we can always dig into the verbose EXPLAIN to understand better what's going on.
Do you think we should include row count estimates and the incremental processing cost? We don't really have a notion of total processing cost for a query yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the majority of our plan tests could be replaced with the simpler format imo. I get annoyed editing join_op_tests.go
as often as I do, but that suite has been enough for me to not make mistakes refactoring join costing so far. So all of the TPC-X plans, and any join plans in the regular suite as a simpler format would be better. I like the duplication for integration plans, but I'd maybe put them in a different file to make them easier to review in GitHub? The differences just get so long that they are hard to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I previously had it so that these tests used the full debug output, but that's not really necessary for the EXPLAIN tests, so I fixed it.
I'll look into paring down info that's in the non-debug plans but might not be needed her, like Projects and columns and ranges and whatnot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few comments
enginetest/enginetests.go
Outdated
func ExecuteNode(ctx *sql.Context, engine QueryEngine, node sql.Node) error { | ||
iter, err := engine.EngineAnalyzer().ExecBuilder.Build(ctx, node, nil) | ||
if err != nil { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return err?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not, comment why we're ignoring the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
enginetest/enginetests.go
Outdated
@@ -52,6 +52,24 @@ import ( | |||
"github.com/dolthub/go-mysql-server/test" | |||
) | |||
|
|||
// ExecuteNode iterates over a node's iterator until it's exhausted. | |||
// This is useful for populating actual row counts for `DESCRIBE ANALYZE`. | |||
func ExecuteNode(ctx *sql.Context, engine QueryEngine, node sql.Node) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call this DrainIter in other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this to call DrainIter. It's not quite the same because this method also builds the iter, and it's named after it's purpose: to resolve all side effects from the query. The fact that we do that by building and draining an iter is an implementation detail.
Describe(options DescribeOptions) string | ||
} | ||
|
||
func Describe(n fmt.Stringer, options DescribeOptions) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method doc here, describe what the method is for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
(In MySQL,
DESCRIBE
andEXPLAIN
are aliases. GMS pretty consistently usesDESCRIBE
internally, so I prefer that here.)This PR has a couple different pieces:
sql.DescribeStats
The most important part of this PR is the mixin struct
sql.DescribeStats
, defined insql/describe.go
. By embedding this struct within aNode
orExpression
, the node will store information about the cost estimate and row count estimate of the plan that was used to generate the node. This information can then be included in the Node/Expressions's string representation by calling the methodDescribeStats.GetDescribeStatsString
.Currently, only
Join
nodes embed this mixin.There's two ways to surface this information during a query.
DESCRIBE format=estimates SELECT ...
will include the row estimates and cost estimates of every node that supports this.DESCRIBE ANALYZE SELECT ...
will execute the plan, and then display the actual row counts alongside the estimates.Note that
DESCRIBE ANALYZE
is only permitted for queries that don't have side effects. For instance,DESCRIBE ANALYZE INSERT ...
will fail with an error.sql.Describable
Previously, each Node or Expression had two different string representations, implemented as
String()
andDebugString()
.DESCRIBE
could emit either by usingformat=tree
(the default) orformat=debug
. This isn't really a scalable solution, and the number of levers we want to control plan formatting may increase further over time.So this PR also adds a new interface for controlling how plans trees are displayed:
sql.Describable
. It has one method:Describe(options sql.DescribeOptions) string
.DescribeOptions
is a struct that contains all options that control how plans are displayed, and more options can be added as needed.These options are set via the value of the
format=
parameter in the describe query, which now accepts an underscore-separated list of options. Eg,format=debug_estimates
.When writing the string representation of a node with children, you should call
sql.Describe(node, options)
instead of callingnode.String()
orsql.DebugString(node)
. This function callsDescribe
on the node if it exists, and otherwise falls back onDebugString
orString
. This allows for incremental support of thesql.Describable
interface as needed instead of needing to add support for every node type right out of the gate.How to add support for additional node types
While only
Join
nodes are currently supported, it's very easy to add support for other types. Literally all you have to do is embed thesql.DescribeStats
struct in the node, and implement theDescribe
method, like so:Testing
In order to test this functionality, every plan test generated by
plangen.go
now has three different expected plans: One for debug strings, one for debug + estimates, and one for debug + estimates + analyze.This may turn out to be too noisy when we make changes to the coster, but the plan tests are already noisy wrt to costing changes. If this turns out to be too much of a hassle we can limit this to just
query_plans.go
instead of all the plan tests.