Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: add the doc for the SEMI_JOIN_REWRITE hint #10607

Merged
merged 7 commits into from
Jul 28, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions optimizer-hints.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,42 @@ SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1,t2 WHERE t1.id = t2.id;
>
> `HASH_JOIN` 的别名是 `TIDB_HJ`,在 3.0.x 及之前版本仅支持使用该别名;之后的版本同时支持使用这两种名称,推荐使用 `HASH_JOIN`。

### SEMI_JOIN_REWRITE()

`SEMI_JOIN_REWRITE()` 提示优化器可以将半连接(Semi Join)进行改写,改写为普通的内连接。目前该 Hint 只作用在 EXISTS 子查询。在不进行改写的情况下,SEMI JOIN 在选择 Hash Join 的执行方式时,只能够使用子查询去构建哈希表,因此在子查询比外查询结果集大时,执行速度可能会不及预期。在选择 Index Join 的执行方式时,只能够使用外查询作为驱动表,因此在子查询比外查询结果集小时,执行速度可能会不及预期。在使用了改写之后,便可以扩大选择范围,选择更好的执行方式。

{{< copyable "sql" >}}

```sql
EXPLAIN SELECT * FROM t WHERE EXISTS (SELECT 1 from t1 where t1.a=t.a);
EXPLAIN SELECT * FROM t WHERE EXISTS (SELECT /*+ SEMI_JOIN_REWRITE() */ 1 from t1 where t1.a=t.a);
```

```sql
+-----------------------------+---------+-----------+------------------------+---------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+------------------------+---------------------------------------------------+
| MergeJoin_9 | 7992.00 | root | | semi join, left key:test.t.a, right key:test.t1.a |
| ├─IndexReader_25(Build) | 9990.00 | root | | index:IndexFullScan_24 |
| │ └─IndexFullScan_24 | 9990.00 | cop[tikv] | table:t1, index:idx(a) | keep order:true, stats:pseudo |
| └─IndexReader_23(Probe) | 9990.00 | root | | index:IndexFullScan_22 |
| └─IndexFullScan_22 | 9990.00 | cop[tikv] | table:t, index:idx(a) | keep order:true, stats:pseudo |
+-----------------------------+---------+-----------+------------------------+---------------------------------------------------+
+------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+
| IndexJoin_16 | 1.25 | root | | inner join, inner:IndexReader_15, outer key:test.t1.a, inner key:test.t.a, equal cond:eq(test.t1.a, test.t.a) |
| ├─StreamAgg_39(Build) | 1.00 | root | | group by:test.t1.a, funcs:firstrow(test.t1.a)->test.t1.a |
| │ └─IndexReader_34 | 1.00 | root | | index:IndexFullScan_33 |
| │ └─IndexFullScan_33 | 1.00 | cop[tikv] | table:t1, index:idx(a) | keep order:true |
| └─IndexReader_15(Probe) | 1.25 | root | | index:Selection_14 |
| └─Selection_14 | 1.25 | cop[tikv] | | not(isnull(test.t.a)) |
| └─IndexRangeScan_13 | 1.25 | cop[tikv] | table:t, index:idx(a) | range: decided by [eq(test.t.a, test.t1.a)], keep order:false, stats:pseudo |
+------------------------------+---------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------+
```

在上述例子中可以看到,在使用了 Hint 之后,TiDB 可以选择由表 `t1` 作为驱动表的 IndexJoin 的执行方式。

### HASH_AGG()

`HASH_AGG()` 提示优化器对指定查询块中所有聚合函数使用 Hash Aggregation 算法。这个算法多线程并发执行,执行速度较快,但会消耗较多内存。例如:
Expand Down
2 changes: 2 additions & 0 deletions subquery-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,5 @@ explain select * from t1 where exists (select * from t2);
| └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+------------------------+----------+-----------+---------------+--------------------------------+
```

除了上述优化器会自动进行的优化之外,也可以通过 [`SEMI_JOIN_REWRITE`](/optimizer-hints.md#semi_join_rewrite) 进行进一步的改写。在不进行改写的情况下,SEMI JOIN 在选择 Hash Join 的执行方式时,只能够使用子查询去构建哈希表,因此在子查询比外查询结果集大时,执行速度可能会不及预期。在选择 Index Join 的执行方式时,只能够使用外查询作为驱动表,因此在子查询比外查询结果集小时,执行速度可能会不及预期。在使用了改写之后,便可以扩大选择范围,选择更好的执行方式。