Skip to content
This repository has been archived by the owner on Apr 21, 2020. It is now read-only.

Investigating the benefits and applications of compile time metaprogramming

Notifications You must be signed in to change notification settings

mkitzan/metaprogramming-optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Metaprogramming Optimization

Investigating the benefits and applications of compile time metaprogramming.

NOTE: All further development of the project has moved to https://github.com/mkitzan/constexpr-sql

Hi, GPU Compiler folks!

This is my most recent project, and it shows my C++ competence better than any other repo of mine. Please note this project is still a work in progress!

This stuff is so dense and complex that it's generally unrealistic for any real production code base. However, it is cool.

Constexpr SQL Expressions

This is the code base for my honors project, supervised by Bill Bird. The end goal of the project is to create a constexpr SQL parser which will serialize a string literal SQL query into a type representing the relational algebra expression tree of the computation. This type can then be instantiated and evaluated at run time with tables (represented as sql::schema objects). Once seeded with tables, the SQL expression object will behave like a stream where the user can query the next row of output or dump the entire output into an sql::schema object. Unfortunately only GCC 9 is supported, because of the widespread use of the new C++20 feature "Class Types in Non-Type Template Parameters" (for cexpr::string objects as non-type template parameters) which is only implemented by GCC 9. The project heavily uses template recursion, template metaprogramming, variadic templates, constexpr, and recursive data types. Excluding constexpr, these are all highly analogous to ML programming language features (as a point of reference).

Class Template: sql::schema

The sql::schema class template represents relational schemas and, when instantiated, SQL tables. The class template is parameterized on two template parameters: Index and Col variadic pack. The Index template argument is used to support group by statements by using SFINAE to select the underlying column data container (std::vector or std::multiset). The Index template argument, when fully specified, provides the comparator functor used by the std::multiset container. The Cols variadic pack is expanded into the sql::row type for the schema. sql::schema supports structured binding declarations which is facilitated partly through the sql::schema API and partly through std namespace injections from sql::row. There is a driver program to show this class template in use. Notice the string literal as a template argument. String literals are lvalue reference types which are passed as const pointers. Normally, pointers can not be used in constexpr contexts. With the new C++20 feature mentioned earlier a constexpr string constructor template can be deduced to turn the string literal into a constexpr object.

Class Template: sql::query

The sql::query class template is the user interface to the SQL query parser. The class is templated on a cexpr::string the SQL query and a variadic number of sql::schema types. The contructor to a fully specified sql::query class takes a variadic pack of sql::schema objects which it uses to seed the relational algebra expression tree with iterators to data. The sql::query object can then be used in a range loop and structured binding declaration like in this example. The relational algebra expression tree uses static members to hold data, so only one object of a single fully specicified sql::query class can exist at once in the program. To ensure this the object should be constructed within the range loop specification like in the example. It's worth noting that even though this class template's source file is the largest among the code base, nearly all of it is only live during compilation to parse the SQL query cexpr::string. In fact, the runtime interface is deliberately insubstantial, merely providing an wrapper to support range loops and structured binding declarations. In compliance with range loop syntax, sql::query has an associated iterator class sql::query_iterator. sql::query_iterator wraps the type representing the relational algebra expression and handles all of the idiosyncrasies of its usage in favor of the familar forward iterator interface.

Class Template: sql::row

The sql::row class template is a template recursive linked-list (similar to std::tuple). A template recursive linked-list is a template metaprogramming pattern which expresses a type analogous to a traditional linked-list. sql::row implements this pattern with two template parameters Col and Next. Col represents the sql::column which the node in list holds a data element from. Next represents the type of the next node in the list, which is either another sql::row type or sql::void_row. Because the next node is expressed as a type the template linked-list does not incur the overhead of holding a next pointer nor the run time cost of dereferencing a pointer to iterate (also makes better use of the cache). A quirk to this pattern is that the node data type need not be homogenous across the list, instead the list may be composed of heterogenous data types. Also, template linked-list access is computed at compile time, so the run time cost is constant.

Relational Algebra Expression Nodes

At the moment: ra::projection, ra::rename, ra::cross, ra::natural, and ra::relation. ra::projection and ra::rename are unary operators which take a single sql::row from their Input relational algebra operator and fold their operation over the row before propagating the transformed row to their Output. The fold is implemented as a template recursive function. ra::cross outputs the cross product of two relations. ra::natural implements a natural join between two relations using a hash table buffer of the right relation for performance. ra::relation is the only terminal node in the expression tree. These operators are composable types and are used to serialize the relational algebra expression tree. Individual objects of each type are not instantiated to compose the expression tree. Instead to ensure the expression tree is a zero overhead abstraction, the types implement a static member function next used to request data from its input type. The actual constexpr template recursive recursive descent SQL parser will serialize these individual nodes together into the appropriate expression tree.

Constexpr Parsing

As a proof of concept for constexpr parsing, two math expression parsers were implemented: cexpr::prefix and cexpr::infix. cexpr::prefix demonstrates the fundamental method of constexpr parsing an expression tree into a class template. cexpr::infix extends this to perform constepxr recursive descent parsing. cexpr::infix is a whole magnitude more complex, because there's recursive function template instantiations to many different function templates. The explanation of constexpr parsing is illustrated through cexpr::prefix for simplicity. The expression tree created while parsing is a template recursive tree which shares similar properties to the template linked-list. A notable benefit to this data structure is that because the tree is composed of types rather than data values, the tree can be used to express computation models (expression trees) rather than just a tree based container. Fundamentally, the parsing is accomplished by calling a template recursive static constexpr parsing function member parameterized on the token position which the parser's "cursor" is standing on (Pos template parameter). At each "cursor" position the function uses the token to decide the how to proceed. If the token indicates the start of an operation, the parser recurses the immediate left subexpression ("cursor" + 1). On return, the left subexpression will report the depth in the token stream it recursed at which point the right subexpression will pick up at this position. This control flow is expressed in this line of cexpr::prefix::parse. Once both left and right subexpressions are parsed, the present node's type is formed (decltype left and right subexpressions) which is then propagated to the caller. Otherwise, if the token indicates a terminal, then an appropriate terminal node is constructed. It is necessary that the "cursor" position is unique across template instantiations, otherwise template memoization will lead to "infinite" recursion. The few ancillary class templates used to support this parsing and the math node struct templates can be found in the templ namespace. This is a driver program using the parsers.

More Links to My Work

I have done some open source work and have created lab material for a C++ programming course and a computer graphics course. All of this work is publicly available on Github, but you might not know where to look. Here is a collection of links to that work:

  • All my pull requests for Windows Terminal (forgive me Apple folks). Generally, the PRs with more comments are the more interesting ones.
  • All the labs created so far for the computer graphics course (hopefully the first OpenGL lab will be up even if doesn't include my solution).
  • All the labs created for the introduction to C++ programming course.

Unfortunately, because the two courses are teaching a very high level C++, there's no discussion in those links about reference types, temporary objects, and copy elision. Lab 2 and Lab 3 of the graphics course have good overviews of core C++ concepts though.

About

Investigating the benefits and applications of compile time metaprogramming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published