-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: introduce placeholder fast path #63807
Conversation
CC @andy-kimball @jordanlewis, thought this might put a smile on your face. |
b9388f1
to
92ac9e5
Compare
❤❤❤ amazing! |
Really happy to see this happen. I know @nvanbenschoten has also been wanting this for a while. 10%+ boost on KV95 is a very nice result. |
This is fantastic! Thanks for the work here @RaduBerinde. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool!
I think it'd be nice to have some execbuilder and/or logictests to ensure the new code in execbuilder/relational.go
works as expected.
Reviewed 8 of 8 files at r1, 14 of 14 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/bench/bench_test.go, line 470 at r2 (raw file):
var execMemo *memo.Memo if h.prepMemo.IsOptimized() { // No placeholders, we already did the exploration at prepare time.
nit: No placeholders or the placeholder fast path succeeded
pkg/sql/opt/xform/placeholder_fast_path.go, line 101 at r2 (raw file):
scan, ok := sel.Input.(*memo.ScanExpr) if !ok {
nit: Might be faster in a some cases to check that the Select's input is a Scan and return ok=false
early before looping through the Select filters.
pkg/sql/opt/xform/placeholder_fast_path.go, line 107 at r2 (raw file):
// Check if there is exactly one covering, non-partial index with a prefix // matching constrainedCols. In that case, using this index is always the // optimal plan.
If there are multiple indexes, but at least one is unique, can't we assume that picking any of the UNIQUE indexes is optimal? This might allow for this optimization to apply in quite a few more cases. I suppose it'd be pointless in practice for a user to prefix any non-unique secondary indexes with columns in another unique index, though, so maybe not worth the added complexity...
pkg/sql/opt/xform/placeholder_fast_path.go, line 133 at r2 (raw file):
if prefixCols.Equals(constrainedCols) { if foundIndex != nil { foundMultiple = true
Perhaps it'd be helpful to have a halting version ofiter.ForEach
that stops iterating based on the callback's return value. Don't worry about it for this PR thought. You can leave a TODO(mgartner): Stop the iterator early if a second index is found.
if you'd like.
pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 14 at r2 (raw file):
INDEX (c,b,a) STORING (d), INDEX (d,c,a), INDEX (d,c,b)
nit: add a partial index and an inverted index and verify the fast path isn't generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Unfortunately, we can't do much stuff with placeholders in execbuilder/logictests. I will think about it. Maybe we can add some special support for EXPLAIN EXECUTE.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @RaduBerinde, and @rytaft)
pkg/sql/opt/xform/placeholder_fast_path.go, line 107 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
If there are multiple indexes, but at least one is unique, can't we assume that picking any of the UNIQUE indexes is optimal? This might allow for this optimization to apply in quite a few more cases. I suppose it'd be pointless in practice for a user to prefix any non-unique secondary indexes with columns in another unique index, though, so maybe not worth the added complexity...
We'd want to scan the index which stores less data (eg fewer columns), which gets into coster territory.
pkg/sql/opt/xform/placeholder_fast_path.go, line 133 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Perhaps it'd be helpful to have a halting version of
iter.ForEach
that stops iterating based on the callback's return value. Don't worry about it for this PR thought. You can leave aTODO(mgartner): Stop the iterator early if a second index is found.
if you'd like.
It might be nice if there are other usecases; I don't think this would be an important one though because this condition should be pretty rare, and all this happens only at prepare time.
92ac9e5
to
6bf6e77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for EXPLAIN ANALYZE EXECUTE and added a few tests like that. Also updated handling of partial indexes.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 14 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: add a partial index and an inverted index and verify the fast path isn't generated.
Hm.. Good point. This brings up a problem with partial indexes - we can't simply ignore them, or we may miss out on using them.
Updated to fail the fast path if there are any partial indexes that we can't exclude because the filters don't constrain any partial predicate columns. PTAL.
We currently cannot EXPLAIN an EXECUTE statement (the syntax doesn't even allow it). EXECUTE is handled with special code at the top-level, so it's tricky to make it work with EXPLAIN. However, EXPLAIN ANALYZE Is also handled at the top level, before the EXECUTE handling, so that should just work. This change allows the parsing of such statements so we can use EXPLAIN ANALYZE EXECUTE. If EXPLAIN EXECUTE is attempted, we get a "not supported" error (this is less cryptic than the parsing error anyway). Release note: None
This commit adds a rule that normalizes `a IN (b)` to `a = b` (and similarly `NOT IN` to `!=`). This query confirms the equivalency holds even when NULLs are involved: ``` WITH vals (v) AS (VALUES (1), (2), (NULL)) SELECT v1.v, v2.v, v1.v NOT IN (v2.v,) AS in, v1.v != v2.v AS eq FROM vals AS v1, vals AS v2 v | v | in | eq -------+------+-------+-------- 1 | 1 | false | false 1 | 2 | true | true 1 | NULL | NULL | NULL 2 | 1 | true | true 2 | 2 | false | false 2 | NULL | NULL | NULL NULL | 1 | NULL | NULL NULL | 2 | NULL | NULL NULL | NULL | NULL | NULL ``` Release note: None
6bf6e77
to
34d57a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for EXPLAIN ANALYZE EXECUTE and added a few tests like that.
Thanks for doing this!
Reviewed 21 of 28 files at r3, 9 of 9 files at r4, 15 of 15 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/xform/placeholder_fast_path.go, line 107 at r2 (raw file):
Previously, RaduBerinde wrote…
We'd want to scan the index which stores less data (eg fewer columns), which gets into coster territory.
👍
pkg/sql/opt/xform/placeholder_fast_path.go, line 133 at r2 (raw file):
Previously, RaduBerinde wrote…
It might be nice if there are other usecases; I don't think this would be an important one though because this condition should be pretty rare, and all this happens only at prepare time.
Ahh... makes sense.
pkg/sql/opt/xform/placeholder_fast_path.go, line 110 at r5 (raw file):
var iter scanIndexIter iter.Init(
nit: you don't need iter
now
pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 208 at r5 (raw file):
b INT, c INT,
nit: remove blank line
pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 227 at r5 (raw file):
SELECT a, b FROM partial1 WHERE a = 0 ---- placeholder-scan partial1@secondary,partial
nit: if you name the (a, b) partial index it'll be easier for a human to understand/verify the test output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great to see this!! Should we add a docs-todo for the EXPLAIN EXECUTE v. EXPLAIN ANALYZE EXECUTE?
nit: in the commit/PR message: "at perparation time" -> "at preparation time"
Reviewed 28 of 28 files at r3, 9 of 9 files at r4, 15 of 15 files at r5.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @RaduBerinde)
pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r5 (raw file):
// if successful (ok=true), the scan is guaranteed never to return more results // than maxRows. func (b *Builder) indexConstraintMaxResults(
nit: explain why you need expr in addition to scan
(or you could just use my suggestion of tagging both expression types with Scan
so you can just pass the expr)
pkg/sql/opt/exec/execbuilder/testdata/scalar, line 1204 at r4 (raw file):
│ columns: (c0) │ estimated row count: 333 (missing stats) │ filter: CASE WHEN (c0 IS DISTINCT FROM CAST(NULL AS DECIMAL)) OR CAST(NULL AS BOOL) THEN ARRAY[NULL] ELSE ARRAY[] END IS NULL
Any idea why the OR CAST(NULL AS BOOL)
is just by itself? (not super important... just thought it was weird)
pkg/sql/opt/ops/relational.opt, line 103 at r5 (raw file):
# # PlaceholderScan cannot have a Constraint or InvertedConstraint. [Relational]
you could add a Scan
tag here and in the Scan
definition -- that would allow you to test IsScan()
and then cast the private to a ScanPrivate
pkg/sql/opt/testutils/opttester/opt_tester.go, line 1112 at r5 (raw file):
ot.appliedRules.Add(int(ruleName)) } })
These two functions seem like they are probably used in a lot of places. Does it make sense to put this code in a helper function?
Added docs-todo label for documenting that we support EXPLAIN ANALYZE EXECUTE, and we (still) don't support EXPLAIN EXECUTE. |
Currently running a query with placeholders happens as follows: - at preparation time, we build a memo with the normalized expression *with* placeholders. - at execution time, we copy the expression into a new memo, replacing placeholders with their value (AssignPlaceholders). - then we run exploration which performs the actual optimization. For trivial queries like KV reads, we do too much work during execution (profiles show it is 10-20% of the runtime for KV workloads with high read rates). This commit introduces the concept of a "placeholder fast path". The idea is that we can make specific checks for simple expressions and produce (at preparation time) a fully optimized expression (which still depends on placeholders). For this, we introduce a new operator, PlaceholderScan which is similar to a Scan except that it always scans one span with the same start and end key, and the key values are child scalar expressions (either constants or placeholders). We use this new operator only for simple SELECTs where the filters constrain a prefix of an index to constant values; in addition, the index must be covering and there must be only one such index. With these conditions, we know the optimal plan upfront. For now, the placeholder fast path transformation is Go code; if it gets more complicated, it should be switched to use optgen rules. A benchmark on my machine shows a kv95 workload going from 34kiops to 38kiops with this change. Release note: None
34d57a0
to
df67aa9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: explain why you need expr in addition to scan
(or you could just use my suggestion of tagging both expression types with
Scan
so you can just pass the expr)
The problem with passing the expression is that we're not passing the private in the PlaceholderScan, we are passing our manufactured private (with the constraint). I have changed it to pass the relational and required properties separately.
pkg/sql/opt/exec/execbuilder/testdata/scalar, line 1204 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Any idea why the
OR CAST(NULL AS BOOL)
is just by itself? (not super important... just thought it was weird)
x=x
gets normalized to x is not null or null
(SimplifySameVarEqualities
rule). The or null
is needed so the result is null
as opposed to false
when x is null (of course, the distinction doesn't matter inside WHEN, but the rule runs independently)
pkg/sql/opt/testutils/opttester/opt_tester.go, line 1112 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
These two functions seem like they are probably used in a lot of places. Does it make sense to put this code in a helper function?
Done (moved into makeOptimizer
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @RaduBerinde)
pkg/sql/opt/exec/execbuilder/testdata/scalar, line 1204 at r4 (raw file):
Previously, RaduBerinde wrote…
x=x
gets normalized tox is not null or null
(SimplifySameVarEqualities
rule). Theor null
is needed so the result isnull
as opposed tofalse
when x is null (of course, the distinction doesn't matter inside WHEN, but the rule runs independently)
got it thanks!
bors r+ |
Build failed (retrying...): |
Build succeeded: |
🚨 This needs to be backported together with follow-up fix #64225. 🚨 |
sql: support EXPLAIN ANALYZE EXECUTE
We currently cannot EXPLAIN an EXECUTE statement (the syntax doesn't
even allow it). EXECUTE is handled with special code at the top-level,
so it's tricky to make it work with EXPLAIN. However, EXPLAIN ANALYZE
Is also handled at the top level, before the EXECUTE handling, so that
should just work.
This change allows the parsing of such statements so we can use
EXPLAIN ANALYZE EXECUTE. If EXPLAIN EXECUTE is attempted, we get a
"not supported" error (this is less cryptic than the parsing error
anyway).
Release note: None
opt: normalize In with single-element list to Eq
This commit adds a rule that normalizes
a IN (b)
toa = b
(andsimilarly
NOT IN
to!=
).This query confirms the equivalency holds even when NULLs are
involved:
Release note: None
opt: introduce placeholder fast path
Currently running a query with placeholders happens as follows:
with placeholders.
replacing placeholders with their value (AssignPlaceholders).
For trivial queries like KV reads, we do too much work during
execution (profiles show it is 10-20% of the runtime for KV workloads
with high read rates).
This commit introduces the concept of a "placeholder fast path". The
idea is that we can make specific checks for simple expressions and
produce (at preparation time) a fully optimized expression (which
still depends on placeholders). For this, we introduce a new operator,
PlaceholderScan which is similar to a Scan except that it always scans
one span with the same start and end key, and the key values are child
scalar expressions (either constants or placeholders).
We use this new operator only for simple SELECTs where the filters
constrain a prefix of an index to constant values; in addition, the
index must be covering and there must be only one such index. With
these conditions, we know the optimal plan upfront.
For now, the placeholder fast path transformation is Go code; if it
gets more complicated, it should be switched to use optgen rules.
A benchmark on my machine shows a kv95 workload going from 34kiops to
38kiops with this change.
Release note: None
Informs #57223.