- Author(s): Pingyu (Ping Yu)
- Last updated: 2020-08-18
- Discussion at: pingcap#17472
This proposal proposes to support the Recursive Common Table Expression (Recursive CTE).
From MySQL User Document, 13.2.15 WITH (Common Table Expressions):
A common table expression (CTE) is a named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement, possibly multiple times.
E.g.
WITH RECURSIVE cte (n) AS
(
SELECT 1
UNION ALL
SELECT n + 1 FROM cte WHERE n < 5
)
SELECT * FROM cte;
Result:
+------+
| n |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
Recursive CTE has two parts, seed part and recursive part, separated by UNION ALL
or UNION [DISTINCT]
.
Seed part contains no reference of CTE(s) or reference of non-recursive CTE(s), and produces initial rows.
Recursive part contains reference of itself, and produces additional rows iteratively. When no more rows are produced, the iterations ends. Mutual-recursive is not supported.
When no more rows are produced, the iterations end. Besides[1]:
cte_max_recursion_depth
system variable limits the number of recursion levels.max_execution_time
system variable limits execution time.MAX_EXECUTION_TIME(N)
optimizer hint limits execution time too.
Recursive CTE is a "linear recursion"[1][5][6]. As a result, each iteration of the recursive part computes only on the rows produced by previous iteration. In order to meet the requirement of "linear recursion", some syntax constraints apply within recursive part[1]:
- Must not contain these constructs:
aggregate functions
,window functions
,GROUP BY
,ORDER BY
,DISTINCT
- References CTE only and only in
FROM
clause. If used inJOIN
, CTE must not be the right side of aLEFT
join.
A CTE can refer to CTEs itself or defined earlier in the same WITH
clause, but not those defined later. So mutually-recursive CTEs are not supported.
CTEs depending on other CTE(s) begins to execute after the dependencies are all met. Independent CTEs can execute parallel.
MariaDB[4] build a dependency matrix in preparatory stage to detect recursive CTEs, check seed parts, and determine dependency between CTEs.
At the same time, MariaDB[4] utilize two temporary tables to save results:
- Records for final result.
- Records for latest iteration.
Temporary table is implemented by chunk.RowContainer
, which stores tuples in memory, or spill to disk if exceeds memory quota.
- Stage 1: Build dependency graph, to detect recursive CTEs, dependency relations, and check whether there are enough seed parts.
- Stage 2: Compute all non-recursive CTEs in graph.
- Stage 3: Traverse dependency graph in width first manner. For each recursive CTE:
- 3.1 Create an empty temporary table
t
, with schema of seed part, and a additional hidden field_iteration_id
, represents the tuple is produced by which iteration. - 3.2 Compute seed parts, and save in
t
, with_iteration_id = 0
. - 3.3
set @@iteration_id = 0
- 3.4
++@@iteration_id
- 3.5 Let
t_last := select * from t where _iteration_id = @@iteration_id-1
- 3.6
If (t_last is empty or @@cte_max_recursion_depth>@@iteration_id or @@max_execution_time>MaxExecutionTime)
: break - 3.7 Compute recursive part on
t_last
, and append tot
with_iteration_id = @@iteration_id
. If recursive part is union with distinct, computet1 := select min(_iteration_id), ALL_OTHER_FIELDS from t group by ALL_OTHER_FIELDS
;t := t1
- 3.8 Goto
3.4
- 3.1 Create an empty temporary table
- TiDB#6824: support recursive common table expression (CTE)
- TiDB#17472: support common table expression
- MySQL 8.0 Reference Manual, 13.2.15 WITH (Common Table Expressions)
- MySQL 8.0 Reference Manual, 8.2.2.4 Optimizing Derived Tables, View References, and Common Table Expressions with Merging or Materialization
- Consider using a disk-based hash table for hash join avoiding OOM
- Implementing Common Table Expressions for MariaDB
- ANSI/ISO/IEC International Standard (IS) Database Language SQL — Part 2: Foundation (SQL/Foundation)
- Types of Recursions