You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The AST machinery is fundamentally limited in the performance that we can achieve. We should explore using JIT compilation/linking instead.
Describe the solution you'd like
We'd explored JIT compilation in the past, but was too slow at the time. A number of things have changed since then:
NVRTC has made significant improvements in runtime compilation (150ms -> 25ms fixed overhead)
JIT LTO is a thing now
NVRTC supports pre-compiled headers now
All of these things contribute to the potential for significantly faster runtime compilation.
The basic idea would be we pre-compile the mixed join kernel and treat the equality comparator like an extern function.
Then at runtime, we JIT compile only the comparator. Then we JIT LTO the comparator into the kernel to avoid the cost of the extern function call.
That way we aren't JIT compiling the entire kernel, which should further reduce the runtime cost.
We could even do this without forcing any user-facing changes. We could take the expression tree that a user gives us today and translate that into a string of C++ code that does the operation expressed by the AST.
Furthermore, by pre-compiling the relevant headers, we can further reduce the runtime costs.
Additional context
There's potential for extending this idea beyond AST stuff.
Any of the places where we're currently dispatching to different comparator instantiations based on the presence of nested types would also be a prime target for JIT compilation/LTO.
The primary benefit there would mostly be compile time/binary size reduction to avoid statically instantiating as many independent code paths.
There could be opportunity for performance benefits as well by JIT compiling for only the exact types needed and eliminating the type dispatcher from the critical path in the row-based operators.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
The AST machinery is fundamentally limited in the performance that we can achieve. We should explore using JIT compilation/linking instead.
Describe the solution you'd like
We'd explored JIT compilation in the past, but was too slow at the time. A number of things have changed since then:
All of these things contribute to the potential for significantly faster runtime compilation.
The basic idea would be we pre-compile the mixed join kernel and treat the equality comparator like an extern function.
Then at runtime, we JIT compile only the comparator. Then we JIT LTO the comparator into the kernel to avoid the cost of the extern function call.
That way we aren't JIT compiling the entire kernel, which should further reduce the runtime cost.
We could even do this without forcing any user-facing changes. We could take the expression tree that a user gives us today and translate that into a string of C++ code that does the operation expressed by the AST.
Furthermore, by pre-compiling the relevant headers, we can further reduce the runtime costs.
Additional context
There's potential for extending this idea beyond AST stuff.
Any of the places where we're currently dispatching to different comparator instantiations based on the presence of nested types would also be a prime target for JIT compilation/LTO.
The primary benefit there would mostly be compile time/binary size reduction to avoid statically instantiating as many independent code paths.
There could be opportunity for performance benefits as well by JIT compiling for only the exact types needed and eliminating the type dispatcher from the critical path in the row-based operators.
The text was updated successfully, but these errors were encountered: