-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plan: propagate constant over outer join #7794
Conversation
- extract `outerCol = const` from join conditions and filter conditions, substitute `outerCol` in join conditions with `const`; - extract `outerCol = innerCol` from join conditions, derive new join conditions based on this column equal condition and `outerCol` related expressions in join conditions and filter conditions;
do not propagate filter with aux column
/run-all-tests |
/run-sqllogic-test |
/run-all-tests |
/run-all-tests |
planner/core/join_explain_test.go
Outdated
testleak.AfterTest(c)() | ||
} | ||
|
||
func (s *testSuite) TestOuterJoinPropConst(c *C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put it to explain test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer putting them into explain test as well, but I learned that explain test does not contribute to code coverage test...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it doesn't contribue to coverage test as well, since most of the changes is at expression
, but this test is for planner/core
. It is better to add unit test for the PropConstOverOuterJoin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, patch updated.
expression/constant_propagation.go
Outdated
value, err := EvalBool(s.ctx, []Expression{con}, chunk.Row{}) | ||
terror.Log(errors.Trace(err)) | ||
if !value { | ||
if fConds { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it can be simplified to s.setConds2ConstFalse(true, fConds)
, and the first parameter is always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I put 2 parameters here just because it looks more clear logically IMHO. If we just pass in a false fConds
, then in the function we set the jConds
to constant false, it may be a little bit confusing.
668c5bc
to
383b2ab
Compare
/run-all-tests |
/run-integration-ddl-test |
expression/constant_propagation.go
Outdated
} | ||
|
||
func (s *basePropConstSolver) getColID(col *Column) int { | ||
code := col.HashCode(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashCode is actually hash the UniqueID
to byte slice. You can use it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, will do it.
expression/constant_propagation.go
Outdated
} | ||
|
||
// deriveConds given `outerCol = innerCol`, derive new expression for specified conditions. | ||
func (s *propOuterJoinConstSolver) deriveConds(outerCol, innerCol *Column, schema *Schema, fCondsOffset int, visited []bool, fConds bool) []bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visited = s.deriveConds(outerCol, innerCol, mergedSchema, lenJoinConds, visited, false)
Here lenJoinConds
is passed. Why here named fCondsOffset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, because it is indeed an offset inside deriveConds
definition, while it is computed from lenJoinConds
in caller...
expression/constant_propagation.go
Outdated
type propOuterJoinConstSolver struct { | ||
basePropConstSolver | ||
jConds []Expression | ||
fConds []Expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
- s/jConds/joinConds/
- s/fConds/filterConds/
expression/constant_propagation.go
Outdated
func (s *propOuterJoinConstSolver) solve(joinConds, filterConds []Expression) ([]Expression, []Expression) { | ||
cols := make([]*Column, 0, len(joinConds)+len(filterConds)) | ||
for _, cond := range joinConds { | ||
s.jConds = append(s.jConds, SplitCNFItems(cond)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove this SplitCNFItems
, we have to make sure all the join conditions have been split before we come to here? For most cases this assumption should hold, because planBuilder::buildJoin
would split the on clause
, but there may exist exceptions, e.g, in handleCompareSubquery -> buildSemiApply
, I am not sure whether the condition
has been split already. Also, this assumption may break for newly added code in future, so I prefer to keep this SplitCNFItems
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
expression/constant_propagation.go
Outdated
s.insertCol(col) | ||
} | ||
if len(s.columns) > MaxPropagateColsCnt { | ||
log.Warnf("[const_propagation] Too many columns: column count is %d, max count is %d.", len(s.columns), MaxPropagateColsCnt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/const_propagation/const_propagation_over_outerjoin/?
@zz-jason comments addressed, PTAL |
expression/constant_propagation.go
Outdated
return nil, nil | ||
} | ||
|
||
// validColEqualCond checks if expression is column equql condition that we can use for constant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equql
-> equal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
// outerJoinPropConst propagates constant equal and column equal conditions over outer join. | ||
func (p *LogicalJoin) outerJoinPropConst(predicates []expression.Expression) []expression.Expression { | ||
if p.JoinType == InnerJoin || p.JoinType == SemiJoin || p.JoinType == AntiSemiJoin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check can be removed?
expression/constant_propagation.go
Outdated
} | ||
|
||
// pickNewEQCondsFunc picks constant equal expression from specified conditions. | ||
func (s *propOuterJoinConstSolver) pickNewEQCondsFunc(retMapper map[int]*Constant, visited []bool, filterConds bool) map[int]*Constant { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about s/pickNewEQCondsFunc/pickEQCondsOnOuterCol/?
expression/constant_propagation.go
Outdated
if col == nil { | ||
if con, ok = cond.(*Constant); ok { | ||
value, err := EvalBool(s.ctx, []Expression{con}, chunk.Row{}) | ||
terror.Log(errors.Trace(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err
is not nil, should we break the process of propagation?
expression/constant_propagation.go
Outdated
value, err := EvalBool(s.ctx, []Expression{con}, chunk.Row{}) | ||
terror.Log(errors.Trace(err)) | ||
if !value { | ||
if filterConds { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the following if
statement can be simplified to:
s.setConds2ConstFalse(true, filterConds)
expression/constant_propagation.go
Outdated
visited[i+condsOffset] = true | ||
updated, foreverFalse := s.tryToUpdateEQList(col, con) | ||
if foreverFalse { | ||
if filterConds { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
expression/constant_propagation.go
Outdated
innerSchema *Schema | ||
} | ||
|
||
func (s *propOuterJoinConstSolver) setConds2ConstFalse(joinConds, filterConds bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the first parameter can be removed, because it's always true
:
5 expression/constant_propagation.go|367 col 9| s.setConds2ConstFalse(true, true)
6 expression/constant_propagation.go|369 col 9| s.setConds2ConstFalse(true, false)
7 expression/constant_propagation.go|384 col 7| s.setConds2ConstFalse(true, true)
8 expression/constant_propagation.go|386 col 7| s.setConds2ConstFalse(true, false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
bugfix fixed pingcap#7518 expression: MySQL compatible current_user function (pingcap#7801) plan: propagate constant over outer join (pingcap#7794) - extract `outerCol = const` from join conditions and filter conditions, substitute `outerCol` in join conditions with `const`; - extract `outerCol = innerCol` from join conditions, derive new join conditions based on this column equal condition and `outerCol` related expressions in join conditions and filter conditions; util/timeutil: fix data race caused by forgetting set stats lease to 0 (pingcap#7901) stats: handle ddl event for partition table (pingcap#7903) plan: implement Operand and Pattern of cascades planner. (pingcap#7910) planner: not convert to TableDual if empty range is derived from deferred constants (pingcap#7808) plan: move projEliminate behind aggEliminate (pingcap#7909) admin: fix admin check table bug of byte compare (pingcap#7887) * admin: remove reflect deepEqual stats: fix panic caused by empty histogram (pingcap#7912) plan: fix panic caused by empty schema of LogicalTableDual (pingcap#7906) * fix drop view if exist error (pingcap#7833) executor: refine `explain analyze` (pingcap#7888) executor: add an variable to compatible with MySQL insert for OGG (pingcap#7863) expression: maintain `DeferredExpr` in aggressive constant folding. (pingcap#7915) stats: fix histogram boundaries overflow error (pingcap#7883) ddl:support the definition of `null` change to `not null` using `alter table` (pingcap#7771) * ddl:support the definition of null change to not null using alter table ddl: add check when create table with foreign key. (pingcap#7885) * ddl: add check when create table with foreign key planner: eliminate if null on non null column (pingcap#7924) executor: fix a bug in point get (pingcap#7934) planner, executor: refine ColumnPrune for LogicalUnionAll (pingcap#7930) executor: fix panic when limit is too large (pingcap#7936) ddl: add TiDB version to metrics (pingcap#7902) stats: limit the length of sample values (pingcap#7931) vendor: update tipb (pingcap#7893) planner: support the Group and GroupExpr for the cascades planner (pingcap#7917) store/tikv: log more information when other err occurs (pingcap#7948) types: fix date time parse (pingcap#7933) ddl: just print error message when ddl job is normal to calcel, to eliminate noisy log (pingcap#7875) stats: update delta info for partition table (pingcap#7947) explaintest: add explain test for partition pruning (pingcap#7505) util: move disjoint set to util package (pingcap#7950) util: add PreAlloc4Row and Insert for Chunk and List (pingcap#7916) executor: add the slow log for commit (pingcap#7951) expression: add builtin json_keys (pingcap#7776) privilege: add USAGE in `show grants` for mysql compatibility (pingcap#7955) ddl: fix invailid ddl job panic (pingcap#7940) *: move ast.NewValueExpr to standalone parser_driver package (pingcap#7952) Make the ast package get rid of the dependency of types.Datum server: allow cors http request (pingcap#7939) *: move `Statement` and `RecordSet` from ast to sqlexec package (pingcap#7970) pr suggestion update executor/aggfuncs: split unit tests to corresponding file (pingcap#7993) store/tikv: fix typo (pingcap#7990) executor, planner: clone proj schema for different children in buildProj4Union (pingcap#7999) executor: let information_schema be the first database in ShowDatabases (pingcap#7938) stats: use local feedback for partition table (pingcap#7963) executor: add unit test for aggfuncs (pingcap#7966) server: add log for binary execute statement (pingcap#7987) admin: refine admin check decoder (pingcap#7862) executor: improve wide table insert & update performance (pingcap#7935) ddl: fix reassigned partition id in `truncate table` does not take effect (pingcap#7919) fix reassigned partition id in truncate table does not take effect add changelog for 2.1.0 rc4 (pingcap#8020) *: make parser package dependency as small as possible (pingcap#7989) parser: support `:=` in the `set` syntax (pingcap#8018) According to MySQL document, `set` use the = assignment operator, but the := assignment operator is also permitted stats: garbage collect stats for partition table (pingcap#7962) docs: add the proposal for the column pool (pingcap#7988) expression: refine built-in func truncate to support uint arg (pingcap#8000) stats: support show stats for partition table (pingcap#8023) stats: update error rate for partition table (pingcap#8022) stats: fix estimation for out of range point queries (pingcap#8015) *: move parser to a separate repository (pingcap#8036) executor: fix wrong result when index join on union scan. (pingcap#8031) Do not modify Plan of dataReaderBuilder directly, because it would impact next batch of outer rows, as well as other concurrent inner workers. Instead, build a local child builder to store the child plan. planner: fix a panic of a cached prepared statement with IndexScan (pingcap#8017) *: fix the issue of executing DDL after executing SQL failure in txn (pingcap#8044) * ddl, executor: fix the issue of executing DDL after executing SQL failure in txn add unit test remove debug info add like evaluator case sensitive test ddl, domain: make schema correct after canceling jobs (pingcap#7997) unit test fix code format proposal: maintaining histograms in plan. (pingcap#7605) support _tidb_rowid for table scan range (pingcap#8047) var rename fix
What problem does this PR solve?
Fix #7098, inner join scenario is handled by #7276, this PR is for outer join.
What is changed and how it works?
Purpose is to generate more conditions we can push down to children plan nodes by constant propagation.
Below is a good example to illustrate this purpose.
Before this PR:
after this PR:
More filters are pushed down to scan nodes, hence less rows in join operator.
Steps to propagate constant over outer join:
outerCol = const
from join conditions and filter conditions,substitute
outerCol
in join conditions withconst
;outerCol = innerCol
from join conditions, derive new joinconditions based on this column equal condition and
outerCol
relatedexpressions in join conditions and filter conditions;
As we have enhanced outer join simplification in #7696, this PR should be a complete solution to cover all cases in the current constant propagation framework IMHO.
Check List
Tests
Code changes
Side effects