You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the decorrelation in tidb is rule-based, not cost-based. But decorrelation doesn't guarantee to generate a better plan. So there are some cases that we need tidb not decorrelate a subquery but tidb directly decorrelates it, and generates a bad plan.
A typical case is like this: The outer query has a very small row count, and the subquery has an index on the correlated column. But there are some other operators (like aggregate) above the table in the subquery, which makes the IndexJoin inapplicable. If tidb decorrelates it, there would be a full scan in the subquery. However, we could have used the index to access only several rows from the table in the subquery.
Example:
createtablet1(a int, b int);
createtablet2(a int, b int, index ia(a));
explain select a > (selectsum(b) from t2 where a =t1.b) from t1;
If the row count of t1 is very small and a = t1.b could filter most rows of data, row-by-row executed Apply will be much faster than the full scan + hash join strategy.
Though cost-based decorrelation is not easy for current tidb, we can provide a method to control this behavior manually.
Currently, there is only a global optimization rule blocklist to control this behavior: https://docs.pingcap.com/tidb/dev/blocklist-control-plan#the-blocklist-of-optimization-rules-and-expression-pushdown
It's very hard to use and not friendly.
To make it easier to control, we can implement a hint like other cost-based choices. To provide more flexibility, we can make this hint a query block level hint so that we can specifically choose which Apply we don't want to decorrelate.
The text was updated successfully, but these errors were encountered:
Add 2 new fields to PlanBuilder to pass information between handleXXXSubquery() and buildSelect()
One field tells buildSelect() whether we are handling a subquery. Another field tells handleXXXSubquery() whether there are valid hints in the subquery.
Check the validity: The hint is invalid if we are not handling a subquery, or there are no correlated columns for this subquery.
If the hint is invalid, report a warning.
Add a new field to LogicalApply. Mark this field in handleXXXSubquery() if there is a valid hint.
Enhancement
Currently, the decorrelation in tidb is rule-based, not cost-based. But decorrelation doesn't guarantee to generate a better plan. So there are some cases that we need tidb not decorrelate a subquery but tidb directly decorrelates it, and generates a bad plan.
A typical case is like this: The outer query has a very small row count, and the subquery has an index on the correlated column. But there are some other operators (like aggregate) above the table in the subquery, which makes the IndexJoin inapplicable. If tidb decorrelates it, there would be a full scan in the subquery. However, we could have used the index to access only several rows from the table in the subquery.
Example:
If the row count of t1 is very small and
a = t1.b
could filter most rows of data, row-by-row executed Apply will be much faster than the full scan + hash join strategy.Though cost-based decorrelation is not easy for current tidb, we can provide a method to control this behavior manually.
Currently, there is only a global optimization rule blocklist to control this behavior:
https://docs.pingcap.com/tidb/dev/blocklist-control-plan#the-blocklist-of-optimization-rules-and-expression-pushdown
It's very hard to use and not friendly.
To make it easier to control, we can implement a hint like other cost-based choices. To provide more flexibility, we can make this hint a query block level hint so that we can specifically choose which Apply we don't want to decorrelate.
The text was updated successfully, but these errors were encountered: