Skip to content

Commit

Permalink
planner: add the doc for the SEMI_JOIN_REWRITE hint (pingcap#9734)
Browse files Browse the repository at this point in the history
  • Loading branch information
TomShawn authored Aug 5, 2022
1 parent 20b375f commit 93c7c3a
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 0 deletions.
52 changes: 52 additions & 0 deletions optimizer-hints.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,58 @@ select /*+ HASH_JOIN(t1, t2) */ * from t1, t2 where t1.id = t2.id;
>
> `TIDB_HJ` is the alias for `HASH_JOIN` in TiDB 3.0.x and earlier versions. If you are using any of these versions, you must apply the `TIDB_HJ(t1_name [, tl_name ...])` syntax for the hint. For the later versions of TiDB, `TIDB_HJ` and `HASH_JOIN` are both valid names for the hint, but `HASH_JOIN` is recommended.
### SEMI_JOIN_REWRITE()

The `SEMI_JOIN_REWRITE()` hint tells the optimizer to rewrite the semi-join query to an ordinary join query. Currently, this hint only works for `EXISTS` subqueries.

If this hint is not used to rewrite the query, when the hash join is selected in the execution plan, the semi-join query can only use the subquery to build a hash table. In this case, when the result of the subquery is bigger than that of the outer query, the execution speed might be slower than expected.

Similarly, when the index join is selected in the execution plan, the semi-join query can only use the outer query as the driving table. In this case, when the result of the subquery is smaller than that of the outer query, the execution speed might be slower than expected.

When `SEMI_JOIN_REWRITE()` is used to rewrite the query, the optimizer can extend the selection range to select a better execution plan.

{{< copyable "sql" >}}

```sql
-- Does not use SEMI_JOIN_REWRITE() to rewrite the query.
EXPLAIN SELECT * FROM t WHERE EXISTS (SELECT 1 FROM t1 WHERE t1.a = t.a);
```

```sql
+-----------------------------+---------+-----------+------------------------+---------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+------------------------+---------------------------------------------------+
| MergeJoin_9 | 7992.00 | root | | semi join, left key:test.t.a, right key:test.t1.a |
| ├─IndexReader_25(Build) | 9990.00 | root | | index:IndexFullScan_24 |
| │ └─IndexFullScan_24 | 9990.00 | cop[tikv] | table:t1, index:idx(a) | keep order:true, stats:pseudo |
| └─IndexReader_23(Probe) | 9990.00 | root | | index:IndexFullScan_22 |
| └─IndexFullScan_22 | 9990.00 | cop[tikv] | table:t, index:idx(a) | keep order:true, stats:pseudo |
+-----------------------------+---------+-----------+------------------------+---------------------------------------------------+
```

{{< copyable "sql" >}}

```sql
-- Uses SEMI_JOIN_REWRITE() to rewrite the query.
EXPLAIN SELECT * FROM t WHERE EXISTS (SELECT /*+ SEMI_JOIN_REWRITE() */ 1 FROM t1 WHERE t1.a = t.a);
```

```sql
+------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+
| IndexJoin_16 | 1.25 | root | | inner join, inner:IndexReader_15, outer key:test.t1.a, inner key:test.t.a, equal cond:eq(test.t1.a, test.t.a) |
| ├─StreamAgg_39(Build) | 1.00 | root | | group by:test.t1.a, funcs:firstrow(test.t1.a)->test.t1.a |
| │ └─IndexReader_34 | 1.00 | root | | index:IndexFullScan_33 |
| │ └─IndexFullScan_33 | 1.00 | cop[tikv] | table:t1, index:idx(a) | keep order:true |
| └─IndexReader_15(Probe) | 1.25 | root | | index:Selection_14 |
| └─Selection_14 | 1.25 | cop[tikv] | | not(isnull(test.t.a)) |
| └─IndexRangeScan_13 | 1.25 | cop[tikv] | table:t, index:idx(a) | range: decided by [eq(test.t.a, test.t1.a)], keep order:false, stats:pseudo |
+------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+
```

From the preceding example, you can see that when using the `SEMI_JOIN_REWRITE()` hint, TiDB can select the execution method of IndexJoin based on the driving table `t1`.

### HASH_AGG()

The `HASH_AGG()` hint tells the optimizer to use the hash aggregation algorithm in all the aggregate functions in the specified query block. This algorithm allows the query to be executed concurrently with multiple threads, which achieves a higher processing speed but consumes more memory. For example:
Expand Down
8 changes: 8 additions & 0 deletions subquery-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,11 @@ explain select * from t where exists (select * from t2);
| └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+------------------------+----------+-----------+---------------+--------------------------------+
```

In the preceding optimization, the optimizer automatically optimizes the statement execution. In addition, you can also add the [`SEMI_JOIN_REWRITE`](/optimizer-hints.md#semi_join_rewrite) hint to further rewrite the statement.

If this hint is not used to rewrite the query, when the hash join is selected in the execution plan, the semi-join query can only use the subquery to build a hash table. In this case, when the result of the subquery is bigger than that of the outer query, the execution speed might be slower than expected.

Similarly, when the index join is selected in the execution plan, the semi-join query can only use the outer query as the driving table. In this case, when the result of the subquery is smaller than that of the outer query, the execution speed might be slower than expected.

When `SEMI_JOIN_REWRITE()` is used to rewrite the query, the optimizer can extend the selection range to select a better execution plan.

0 comments on commit 93c7c3a

Please sign in to comment.