Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert unique subquery directly to inner join #7395

Closed
morgo opened this issue Aug 14, 2018 · 3 comments
Closed

Convert unique subquery directly to inner join #7395

morgo opened this issue Aug 14, 2018 · 3 comments
Assignees
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@morgo
Copy link
Contributor

morgo commented Aug 14, 2018

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

Using the world sample database from MySQL, I tried this sample query:

EXPLAIN SELECT * FROM City WHERE CountryCode IN (SELECT code FROM Country WHERE code='CAN');
+------------------------+---------+------+-----------------------------------------------------------------------------------------+
| id           | count  | task | operator info                                      |
+------------------------+---------+------+-----------------------------------------------------------------------------------------+
| HashLeftJoin_10    | 3263.20 | root | semi join, inner:IndexReader_20, equal:[eq(world.city.countrycode, world.country.code)] |
| ├─TableReader_18    | 4079.00 | root | data:TableScan_17                                    |
| │ └─TableScan_17    | 4079.00 | cop | table:city, range:[-inf,+inf], keep order:false                     |
| └─IndexReader_20    | 1.00  | root | index:IndexScan_19                                   |
|  └─IndexScan_19    | 1.00  | cop | table:country, index:Code, range:["CAN","CAN"], keep order:false            |
+------------------------+---------+------+-----------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
  1. What did you expect to see?

Because Country.code is the primary key, I was expecting that TiDB would detect that the result set is guaranteed unique. Thus, the semi-join step is not required and the query could be converted directly to an INNER JOIN. I expect this type of query is common for developers who prefer subqueries over joins.

  1. What did you see instead?

Semi-join.

  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?
MySQL [world]> select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v2.1.0-beta-156-g5e7aa1d
Git Commit Hash: 5e7aa1d97d6459843949ae3587c9b5ac6661bb37
Git Branch: master
UTC Build Time: 2018-08-01 02:49:37
GoVersion: go version go1.10.2 linux/amd64
Race Enabled: false
TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
1 row in set (0.00 sec)
@zz-jason zz-jason added sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement. labels Aug 14, 2018
@winoros
Copy link
Member

winoros commented Aug 15, 2018

This is a special case of #7205 , we need implement it first. Then use unique and not null information to remove the aggregation.

@winoros winoros self-assigned this Nov 6, 2018
@winoros
Copy link
Member

winoros commented Jan 8, 2019

Closed since #7205 is done.


There may be some case that the aggregate doesn't be resolved successfully when the inner query is a little complex.
Will leave it as a work in the cascades-like planner.

@winoros winoros closed this as completed Jan 8, 2019
@morgo
Copy link
Contributor Author

morgo commented Jan 8, 2019

Confirming the original case looks good:

mysql> mysql> EXPLAIN SELECT * FROM City WHERE CountryCode IN (SELECT code FROM Country WHERE code='CAN');
+--------------------------+-------+------+-------------------------------------------------------------------------------------------------------+
| id                       | count | task | operator info                                                                                         |
+--------------------------+-------+------+-------------------------------------------------------------------------------------------------------+
| Projection_11            | 1.25  | root | world.city.id, world.city.name, world.city.countrycode, world.city.district, world.city.population    |
| └─IndexJoin_16           | 1.25  | root | inner join, inner:IndexLookUp_15, outer key:world.country.code, inner key:world.city.countrycode      |
|   ├─IndexReader_30       | 10.00 | root | index:IndexScan_29                                                                                    |
|   │ └─IndexScan_29       | 10.00 | cop  | table:country, index:Code, range:["CAN","CAN"], keep order:false, stats:pseudo                        |
|   └─IndexLookUp_15       | 10.00 | root |                                                                                                       |
|     ├─IndexScan_13       | 10.00 | cop  | table:city, index:CountryCode, range: decided by [world.country.code], keep order:false, stats:pseudo |
|     └─TableScan_14       | 10.00 | cop  | table:city, keep order:false, stats:pseudo                                                            |
+--------------------------+-------+------+-------------------------------------------------------------------------------------------------------+
7 rows in set (0.00 sec)

mysql> analyze table City, Country;
Query OK, 0 rows affected (0.87 sec)

mysql> EXPLAIN SELECT * FROM City WHERE CountryCode IN (SELECT code FROM Country WHERE code='CAN');
+--------------------------+-------+------+----------------------------------------------------------------------------------------------------+
| id                       | count | task | operator info                                                                                      |
+--------------------------+-------+------+----------------------------------------------------------------------------------------------------+
| Projection_11            | 1.25  | root | world.city.id, world.city.name, world.city.countrycode, world.city.district, world.city.population |
| └─IndexJoin_16           | 1.25  | root | inner join, inner:IndexLookUp_15, outer key:world.country.code, inner key:world.city.countrycode   |
|   ├─IndexReader_30       | 1.00  | root | index:IndexScan_29                                                                                 |
|   │ └─IndexScan_29       | 1.00  | cop  | table:country, index:Code, range:["CAN","CAN"], keep order:false                                   |
|   └─IndexLookUp_15       | 17.58 | root |                                                                                                    |
|     ├─IndexScan_13       | 17.58 | cop  | table:city, index:CountryCode, range: decided by [world.country.code], keep order:false            |
|     └─TableScan_14       | 17.58 | cop  | table:city, keep order:false                                                                       |
+--------------------------+-------+------+----------------------------------------------------------------------------------------------------+
7 rows in set (0.00 sec)

Country.Code is the PRIMARY KEY so I don't understand why the pseudo stats do not consider the dictionary information. I will file a separate issue on it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants