Support query block level `no_decorrelate()` hint #37789

time-and-fate · 2022-09-13T15:14:22Z

Enhancement

Currently, the decorrelation in tidb is rule-based, not cost-based. But decorrelation doesn't guarantee to generate a better plan. So there are some cases that we need tidb not decorrelate a subquery but tidb directly decorrelates it, and generates a bad plan.

A typical case is like this: The outer query has a very small row count, and the subquery has an index on the correlated column. But there are some other operators (like aggregate) above the table in the subquery, which makes the IndexJoin inapplicable. If tidb decorrelates it, there would be a full scan in the subquery. However, we could have used the index to access only several rows from the table in the subquery.
Example:

create table t1(a int, b int);
create table t2(a int, b int, index ia(a));
explain select a > (select sum(b) from t2 where a = t1.b) from t1;

+------------------------------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------+
| id                                 | estRows  | task      | access object | operator info                                                                             |
+------------------------------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------+
| Projection_10                      | 10000.00 | root      |               | gt(cast(test.t1.a, decimal(20,0) BINARY), Column#10)->Column#11                           |
| └─HashJoin_11                      | 10000.00 | root      |               | left outer join, equal:[eq(test.t1.b, test.t2.a)]                                         |
|   ├─HashAgg_22(Build)              | 7992.00  | root      |               | group by:test.t2.a, funcs:sum(Column#13)->Column#10, funcs:firstrow(test.t2.a)->test.t2.a |
|   │ └─TableReader_23               | 7992.00  | root      |               | data:HashAgg_15                                                                           |
|   │   └─HashAgg_15                 | 7992.00  | cop[tikv] |               | group by:test.t2.a, funcs:sum(test.t2.b)->Column#13                                       |
|   │     └─Selection_21             | 9990.00  | cop[tikv] |               | not(isnull(test.t2.a))                                                                    |
|   │       └─TableFullScan_20       | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo                                                            |
|   └─TableReader_14(Probe)          | 10000.00 | root      |               | data:TableFullScan_13                                                                     |
|     └─TableFullScan_13             | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                            |
+------------------------------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------+

If the row count of t1 is very small and a = t1.b could filter most rows of data, row-by-row executed Apply will be much faster than the full scan + hash join strategy.

Though cost-based decorrelation is not easy for current tidb, we can provide a method to control this behavior manually.
Currently, there is only a global optimization rule blocklist to control this behavior:
https://docs.pingcap.com/tidb/dev/blocklist-control-plan#the-blocklist-of-optimization-rules-and-expression-pushdown
It's very hard to use and not friendly.
To make it easier to control, we can implement a hint like other cost-based choices. To provide more flexibility, we can make this hint a query block level hint so that we can specifically choose which Apply we don't want to decorrelate.

The text was updated successfully, but these errors were encountered:

time-and-fate · 2022-09-14T02:32:03Z

Design

Make the parser support no_decorrelate() hint.
Add 2 new fields to PlanBuilder to pass information between handleXXXSubquery() and buildSelect()
One field tells buildSelect() whether we are handling a subquery. Another field tells handleXXXSubquery() whether there are valid hints in the subquery.
Check the validity: The hint is invalid if we are not handling a subquery, or there are no correlated columns for this subquery.
If the hint is invalid, report a warning.
Add a new field to LogicalApply. Mark this field in handleXXXSubquery() if there is a valid hint.
During decorrelation, skip marked LogicalApply.

close #37789

…#37633) close pingcap#37789

time-and-fate added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner labels Sep 13, 2022

time-and-fate mentioned this issue Sep 13, 2022

parser, planner: add query block level no_decorrelate hint #37633

Merged

12 tasks

ti-chi-bot closed this as completed in #37633 Sep 20, 2022

ti-chi-bot pushed a commit that referenced this issue Sep 20, 2022

parser, planner: add query block level no_decorrelate hint (#37633)

df82263

close #37789

time-and-fate added a commit to time-and-fate/tidb that referenced this issue Sep 22, 2022

parser, planner: add query block level no_decorrelate hint (pingcap…

2d735c1

…#37633) close pingcap#37789

time-and-fate mentioned this issue Sep 22, 2022

parser, planner: add query block level no_decorrelate hint (#37633) #38101

Open

12 tasks

VelocityLight added affects-5.0 This bug affects 5.0.x versions. and removed affects-5.0 This bug affects 5.0.x versions. labels Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support query block level `no_decorrelate()` hint #37789

Support query block level `no_decorrelate()` hint #37789

time-and-fate commented Sep 13, 2022

time-and-fate commented Sep 14, 2022

Support query block level no_decorrelate() hint #37789

Support query block level no_decorrelate() hint #37789

Comments

time-and-fate commented Sep 13, 2022

Enhancement

time-and-fate commented Sep 14, 2022

Design

Support query block level `no_decorrelate()` hint #37789

Support query block level `no_decorrelate()` hint #37789