diff --git a/optimizer-hints.md b/optimizer-hints.md index 6b42daa777d07..633c5a155ed67 100644 --- a/optimizer-hints.md +++ b/optimizer-hints.md @@ -133,6 +133,58 @@ select /*+ HASH_JOIN(t1, t2) */ * from t1, t2 where t1.id = t2.id; > > `TIDB_HJ` is the alias for `HASH_JOIN` in TiDB 3.0.x and earlier versions. If you are using any of these versions, you must apply the `TIDB_HJ(t1_name [, tl_name ...])` syntax for the hint. For the later versions of TiDB, `TIDB_HJ` and `HASH_JOIN` are both valid names for the hint, but `HASH_JOIN` is recommended. +### SEMI_JOIN_REWRITE() + +The `SEMI_JOIN_REWRITE()` hint tells the optimizer to rewrite the semi-join query to an ordinary join query. Currently, this hint only works for `EXISTS` subqueries. + +If this hint is not used to rewrite the query, when the hash join is selected in the execution plan, the semi-join query can only use the subquery to build a hash table. In this case, when the result of the subquery is bigger than that of the outer query, the execution speed might be slower than expected. + +Similarly, when the index join is selected in the execution plan, the semi-join query can only use the outer query as the driving table. In this case, when the result of the subquery is smaller than that of the outer query, the execution speed might be slower than expected. + +When `SEMI_JOIN_REWRITE()` is used to rewrite the query, the optimizer can extend the selection range to select a better execution plan. + +{{< copyable "sql" >}} + +```sql +-- Does not use SEMI_JOIN_REWRITE() to rewrite the query. +EXPLAIN SELECT * FROM t WHERE EXISTS (SELECT 1 FROM t1 WHERE t1.a = t.a); +``` + +```sql ++-----------------------------+---------+-----------+------------------------+---------------------------------------------------+ +| id | estRows | task | access object | operator info | ++-----------------------------+---------+-----------+------------------------+---------------------------------------------------+ +| MergeJoin_9 | 7992.00 | root | | semi join, left key:test.t.a, right key:test.t1.a | +| ├─IndexReader_25(Build) | 9990.00 | root | | index:IndexFullScan_24 | +| │ └─IndexFullScan_24 | 9990.00 | cop[tikv] | table:t1, index:idx(a) | keep order:true, stats:pseudo | +| └─IndexReader_23(Probe) | 9990.00 | root | | index:IndexFullScan_22 | +| └─IndexFullScan_22 | 9990.00 | cop[tikv] | table:t, index:idx(a) | keep order:true, stats:pseudo | ++-----------------------------+---------+-----------+------------------------+---------------------------------------------------+ +``` + +{{< copyable "sql" >}} + +```sql +-- Uses SEMI_JOIN_REWRITE() to rewrite the query. +EXPLAIN SELECT * FROM t WHERE EXISTS (SELECT /*+ SEMI_JOIN_REWRITE() */ 1 FROM t1 WHERE t1.a = t.a); +``` + +```sql ++------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+ +| IndexJoin_16 | 1.25 | root | | inner join, inner:IndexReader_15, outer key:test.t1.a, inner key:test.t.a, equal cond:eq(test.t1.a, test.t.a) | +| ├─StreamAgg_39(Build) | 1.00 | root | | group by:test.t1.a, funcs:firstrow(test.t1.a)->test.t1.a | +| │ └─IndexReader_34 | 1.00 | root | | index:IndexFullScan_33 | +| │ └─IndexFullScan_33 | 1.00 | cop[tikv] | table:t1, index:idx(a) | keep order:true | +| └─IndexReader_15(Probe) | 1.25 | root | | index:Selection_14 | +| └─Selection_14 | 1.25 | cop[tikv] | | not(isnull(test.t.a)) | +| └─IndexRangeScan_13 | 1.25 | cop[tikv] | table:t, index:idx(a) | range: decided by [eq(test.t.a, test.t1.a)], keep order:false, stats:pseudo | ++------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+ +``` + +From the preceding example, you can see that when using the `SEMI_JOIN_REWRITE()` hint, TiDB can select the execution method of IndexJoin based on the driving table `t1`. + ### HASH_AGG() The `HASH_AGG()` hint tells the optimizer to use the hash aggregation algorithm in all the aggregate functions in the specified query block. This algorithm allows the query to be executed concurrently with multiple threads, which achieves a higher processing speed but consumes more memory. For example: diff --git a/subquery-optimization.md b/subquery-optimization.md index a03e26dbaa74e..6391c832eeb80 100644 --- a/subquery-optimization.md +++ b/subquery-optimization.md @@ -86,3 +86,11 @@ explain select * from t where exists (select * from t2); | └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | +------------------------+----------+-----------+---------------+--------------------------------+ ``` + +In the preceding optimization, the optimizer automatically optimizes the statement execution. In addition, you can also add the [`SEMI_JOIN_REWRITE`](/optimizer-hints.md#semi_join_rewrite) hint to further rewrite the statement. + +If this hint is not used to rewrite the query, when the hash join is selected in the execution plan, the semi-join query can only use the subquery to build a hash table. In this case, when the result of the subquery is bigger than that of the outer query, the execution speed might be slower than expected. + +Similarly, when the index join is selected in the execution plan, the semi-join query can only use the outer query as the driving table. In this case, when the result of the subquery is smaller than that of the outer query, the execution speed might be slower than expected. + +When `SEMI_JOIN_REWRITE()` is used to rewrite the query, the optimizer can extend the selection range to select a better execution plan.