Investigating the benefits and applications of compile time metaprogramming.
NOTE: All further development of the project has moved to https://github.com/mkitzan/constexpr-sql
This is my most recent project, and it shows my C++ competence better than any other repo of mine. Please note this project is still a work in progress!
This stuff is so dense and complex that it's generally unrealistic for any real production code base. However, it is cool.
This is the code base for my honors project, supervised by Bill Bird. The end goal of the project is to create a constexpr
SQL parser which will serialize a string literal SQL query into a type representing the relational algebra expression tree of the computation. This type can then be instantiated and evaluated at run time with tables (represented as sql::schema
objects). Once seeded with tables, the SQL expression object will behave like a stream where the user can query the next
row of output or dump the entire output into an sql::schema
object. Unfortunately only GCC 9 is supported, because of the widespread use of the new C++20 feature "Class Types in Non-Type Template Parameters" (for cexpr::string
objects as non-type template parameters) which is only implemented by GCC 9. The project heavily uses template recursion, template metaprogramming, variadic templates, constexpr, and recursive data types. Excluding constexpr, these are all highly analogous to ML programming language features (as a point of reference).
The sql::schema
class template represents relational schemas and, when instantiated, SQL tables. The class template is parameterized on two template parameters: Index
and Col
variadic pack. The Index
template argument is used to support group by statements by using SFINAE to select the underlying column data container (std::vector
or std::multiset
). The Index
template argument, when fully specified, provides the comparator functor used by the std::multiset
container. The Cols
variadic pack is expanded into the sql::row
type for the schema. sql::schema
supports structured binding declarations which is facilitated partly through the sql::schema
API and partly through std
namespace injections from sql::row
. There is a driver program to show this class template in use. Notice the string literal as a template argument. String literals are lvalue reference types which are passed as const
pointers. Normally, pointers can not be used in constexpr
contexts. With the new C++20 feature mentioned earlier a constexpr
string constructor template can be deduced to turn the string literal into a constexpr
object.
The sql::query
class template is the user interface to the SQL query parser. The class is templated on a cexpr::string
the SQL query and a variadic number of sql::schema
types. The contructor to a fully specified sql::query
class takes a variadic pack of sql::schema
objects which it uses to seed the relational algebra expression tree with iterators to data. The sql::query
object can then be used in a range loop and structured binding declaration like in this example. The relational algebra expression tree uses static members to hold data, so only one object of a single fully specicified sql::query
class can exist at once in the program. To ensure this the object should be constructed within the range loop specification like in the example. It's worth noting that even though this class template's source file is the largest among the code base, nearly all of it is only live during compilation to parse the SQL query cexpr::string
. In fact, the runtime interface is deliberately insubstantial, merely providing an wrapper to support range loops and structured binding declarations. In compliance with range loop syntax, sql::query
has an associated iterator class sql::query_iterator
. sql::query_iterator
wraps the type representing the relational algebra expression and handles all of the idiosyncrasies of its usage in favor of the familar forward iterator
interface.
The sql::row
class template is a template recursive linked-list (similar to std::tuple
). A template recursive linked-list is a template metaprogramming pattern which expresses a type analogous to a traditional linked-list. sql::row
implements this pattern with two template parameters Col
and Next
. Col
represents the sql::column
which the node in list holds a data element from. Next
represents the type of the next node in the list, which is either another sql::row
type or sql::void_row
. Because the next node is expressed as a type the template linked-list does not incur the overhead of holding a next pointer nor the run time cost of dereferencing a pointer to iterate (also makes better use of the cache). A quirk to this pattern is that the node data type need not be homogenous across the list, instead the list may be composed of heterogenous data types. Also, template linked-list access is computed at compile time, so the run time cost is constant.
At the moment: ra::projection
, ra::rename
, ra::cross
, ra::natural
, and ra::relation
. ra::projection
and ra::rename
are unary operators which take a single sql::row
from their Input
relational algebra operator and fold their operation over the row before propagating the transformed row to their Output
. The fold
is implemented as a template recursive function. ra::cross
outputs the cross product of two relations. ra::natural
implements a natural join between two relations using a hash table buffer of the right relation for performance. ra::relation
is the only terminal node in the expression tree. These operators are composable types and are used to serialize the relational algebra expression tree. Individual objects of each type are not instantiated to compose the expression tree. Instead to ensure the expression tree is a zero overhead abstraction, the types implement a static
member function next
used to request data from its input type. The actual constexpr
template recursive recursive descent SQL parser will serialize these individual nodes together into the appropriate expression tree.
As a proof of concept for constexpr
parsing, two math expression parsers were implemented: cexpr::prefix
and cexpr::infix
. cexpr::prefix
demonstrates the fundamental method of constexpr
parsing an expression tree into a class template. cexpr::infix
extends this to perform constepxr
recursive descent parsing. cexpr::infix
is a whole magnitude more complex, because there's recursive function template instantiations to many different function templates. The explanation of constexpr
parsing is illustrated through cexpr::prefix
for simplicity. The expression tree created while parsing is a template recursive tree which shares similar properties to the template linked-list. A notable benefit to this data structure is that because the tree is composed of types rather than data values, the tree can be used to express computation models (expression trees) rather than just a tree based container. Fundamentally, the parsing is accomplished by calling a template recursive static constexpr
parsing function member parameterized on the token position which the parser's "cursor" is standing on (Pos
template parameter). At each "cursor" position the function uses the token to decide the how to proceed. If the token indicates the start of an operation, the parser recurses the immediate left subexpression ("cursor" + 1). On return, the left subexpression will report the depth in the token stream it recursed at which point the right subexpression will pick up at this position. This control flow is expressed in this line of cexpr::prefix::parse
. Once both left and right subexpressions are parsed, the present node's type is formed (decltype
left and right subexpressions) which is then propagated to the caller. Otherwise, if the token indicates a terminal, then an appropriate terminal node is constructed. It is necessary that the "cursor" position is unique across template instantiations, otherwise template memoization will lead to "infinite" recursion. The few ancillary class templates used to support this parsing and the math node struct templates can be found in the templ
namespace. This is a driver program using the parsers.
I have done some open source work and have created lab material for a C++ programming course and a computer graphics course. All of this work is publicly available on Github, but you might not know where to look. Here is a collection of links to that work:
- All my pull requests for Windows Terminal (forgive me Apple folks). Generally, the PRs with more comments are the more interesting ones.
- All the labs created so far for the computer graphics course (hopefully the first OpenGL lab will be up even if doesn't include my solution).
- All the labs created for the introduction to C++ programming course.
Unfortunately, because the two courses are teaching a very high level C++, there's no discussion in those links about reference types, temporary objects, and copy elision. Lab 2 and Lab 3 of the graphics course have good overviews of core C++ concepts though.