Skip to content

SQL: Ephemeral Tables

Nikita Pettik edited this page Jan 9, 2018 · 3 revisions

Since replacing of original SQLite ephemeral tables with Tarantools' one is considered to be complicated enough, it was suggested to divide process into several stages.

The main goal of the first stage is to make prototype and estimate demanded efforts. In this regard, we will replace ephemeral tables only for one purpose. A good candidate seems to be a prepared statement for DELETE FROM tab WHERE, which uses ephemeral table to hold all entities to be deleted immediately before removing from original table.

Basically we need to support all basic operations on ephemeral tables, e.g. insertion, deletion, selection etc. This routine implies usage of IdxInsert, Rewind, Next and other opcodes, which call tarantoolSqlite3* functions. They are sort of layer between SQL and Tarantool and implemented using box API. However, emphemeral tables are incapable of using box API, mainly due to the lack of space_id. Thus, first task will be an implementation of tarantoolSqlite3Ephemeral* functions. In addition, SQL cursors should be adapted for ephemeral tables, since they need to hold struct space* and appropriate flag (BTCF_TEphemCursor), in order to call tarantoolSqlite3Ephemeral* functions. Finally, it is required to implement OP_OpenTEphemeral opcode which would create ephemeral table and set new cursor to it.


To be more precise, an approximate plan for the first stage is:

  1. Implement tarantoolSqlite3 API for ephemeral tables:

    tarantoolSqlite3EmphemeralCreate(),

    tarantoolSqlite3EphemeralInsert(),

    tarantoolSqlite3EphemeralDrop() and other basic functions.

Subtasks:

  • Explore inner parts of Tarantool. Understand how to correctly deal with ephemeral tables.
  • Attempt to create Ephemeral API in the similar way to original tarantoolSqlite3* API, except for using space_id.

Done: for each tarantoolSqlite3* function its ephemeral analogue tarantoolSqlite3Ephemeral*, which don't use box API has been implemented. In the nearest future, these two APIs will be merged for the unification sake.


  1. Expand SQL cursors in order to work with ephemeral tables: they should store struct space* and should be handled a little bit different in VDBE and in tarantoolSqlite3 interface. For example, functions such as cursor_seek(), cursor_advance() use box iterators to navigate through the space. As far as we can't manipulate cursors to emphemeral tables via box API, these function should be reimplemented.

Subtasks:

  • Patch VDBE opcodes such as IdxInsert, Rewind, Next and other in order to handle cursors to ephemeral table.
  • Implement cursor_ephemeral_* functions without relying on box iterators.

Done: implemented set of cursor_ephemeral_* functions, which do the same routine as original cursor functions, but for ephemeral tables. Currently, each btree's realisation of cursor positioning routine includes two branches: one for original SQL cursor which corresponds to BTCF_TaCursor flag and another for ephemeral cursor with BTCF_TEphemCursor flag.


  1. Implement OP_OpenTEphemeral opcode which would create ephemeral space with given number of fields and set up cursor to it.

    • This part mainly consists from combining steps 1 and 2.

    Done: OP_OpenTEphemeral opcode creates new ephemeral table with given number of columns and collation. There are no secondary indexes on ephemeral tables by definition. Primary index consists of all table columns for the simplicity sake.


  1. Expand all other opcodes which work with cursors to ephemeral tables. For instance, after cursor to ephemeral table closes, this table should be deleted with space_delete().

    Done.


Second stage is mainly focused on complete replacing of original OP_OpenEmpemeral with OP_OpenTEphemeral and removing code which is related to rowid.

It is worth mentioning, that there are two types of ephemeral tables: one holds rows as they are inserted (all rows should be distinct); another adds sort of rowid column in order to hold equal rows. In the first case it is quite simple to replace SQLite ephemeral tables since no changes to opcode generation should be added -- without rowid tables are considered to be indexes (in SQLite meaning). On the other hand, ephemeral tables of second type originally used rowid and table specific opcodes, so in this respect it is required to rewrite emitted opcodes.


Potential pitfalls

Sorting

To sort data it is enough to create ephemeral table and insert there rows to be sorted (for large data sets SQLite uses special sorter which aggregates rows and then sorts them at once using merge algorithm). Currently it is impossible to sort using more than one order for all columns (e.g. in most cases query SELECT * FROM t1 ORDER BY a ASC, b DESC will lead to wrong result). This limitation occurs due to the lack of iteration through different orders on different columns in Tarantool. It will be added in SQL as soon as it is implemented in Tarantool.

Transactions

There is no need to track transactions for ephemeral tables: ones can be created and destroyed within one transaction. Furthermore, it would lead to incorrect behaviour while rollbacking transaction. To avoid transaction routine separate vtab for ephemeral tables has been added. It is substituted with original memtx vtab while initialising ephemeral space in memtx_init_ephemeral_space(). What is more, such vtab resolves another problem -- deletion nulls from ephemeral spaces.
Consider silly but still correct query SELECT NULL EXCEPT SELECT NULL. Firstly, RHS query SELECT NULL is executed and result (i.e. NULL) is inserted into intermediate ephemeral table. Then, each row from LHS query (i.e. NULL) is looked up in ephemeral table and, in success case, is deleted from it. As it was said earlier, primary key consists of all columns in table, so NULL deletion occurs in PK. Practically, it is not correct to insert nulls to PK due to ambiguity when deleting/fetching rows. Thus, memtx deletion checks tuple on containing nulls in it. On the other hand, SQLite guarantees that no such situations (i.e. ambiguity) will ever occur with ephemeral tables, so it is safety to insert nulls to PK. Thus, ephemeral vtab features delete operation without checking on nullability in tuple.

Disabled optimisations

Autoindexes are known to be disabled on WITHOUT ROWID tables in SQLite, so now they are don't work at all (earlier, they sometimes worked on ephemeral tables since part of them were implemented as rowid tables). Another optimisation concerns ORDER BY clause. Lets look at query SELECT a,b,c FROM tab ORDER BY b. SQLite uses ephemeral table to hold ordered by b rows. It is not necessary to hold all columns since we are willing to select only a, b and c columns. However, we also need to append to ephemeral table additional columns by which they will be sorted: columns from ORDER BY clause. On the other hand, it is not needed if columns from SELECT clause already contain column(s) from ORDER BY: we are able not to add them to ephemeral table again — this is sort of optimisation. But the main problem lies in the fact that code generator creates ephemeral table without any notion about columns, it knowns only columns amount. So it creates ephemeral table with n rows (where n = columns from select + columns from order by), and later insert rows with m rows (where m = columns from SELECT except columns from ORDER BY). If columns from SELECT doesn’t intersect with columns from GROUP BY, then n == m. It seems that in SQLite it was possible to make this trick, but in Tarantool this hack is impossible. Probably, solution for this particular optimisation can be delaying table creation until the first tuple is inserted. Thus, we would know accurate number of column in table.


Finally, removing rowid doesn't seem to be sophisticated. Functions generating code for VDBE usually have two branches: one which contains opcodes (table specific) for rowid tables and another -- for ordinary ones (index specific). So, in this case it is enough to delete rowid branch. The only non-trivial situation touches VIEWs representation. Originally, they were implemented as rowid ephemeral tables, hence each operation (UPDATE, INSERT, DELETE) on VIEWs should be rewritten using index specific opcodes.

Clone this wiki locally