[FEA] Expand JIT functionality in libcudf #18023

GregoryKimball · 2025-02-16T22:10:15Z

Is your feature request related to a problem? Please describe.
There are a some areas where JIT-compiled kernels can provide performance improvements over existing libcudf functions.

Please note that this issue is focused on CUDA C++ features in libcudf that use JITIFY and nvrtc, rather than cuDF-python features using Numba to generate PTX from user-defined Python functions.

JIT transforms, JIT projection expressions
JIT transforms, or UDF (user defined function) transforms, can be used to fuse together multiple binary ops or function calls within a single kernel. This eliminates the materialization of intermediates and for complex expressions can lead to significant speedup. We've written a custom "polynomials" benchmark in #17695 that shows >10x speedup for JIT-compiled kernels versus binary ops and AST (abstract syntax tree) implementations.

support decimal types Implemented Decimal Transforms #17968
support multiple column inputs (single column output) Added Multi-input & Scalar Support for Transform UDFs #17881
compare imbalanced_tree benchmarks for JIT vs binary ops vs AST
collect data on NDS and NDS-H runtime impact of JIT compiled expressions
support operators with string input and fixed width output
support operators with string input and string output

JIT aggregation
JIT aggregations, or UDAFs (user defined aggregation functions), can be used to complete complex transformations on the groups of a groupby aggregation. libcudf supports both CUDA and PTX aggregation kinds.

Some examples of UDAFs could include "compute score" with additional flexibility for feature engineering. Here are some "compute score" examples from the archived TorchArrow project.

To support some of these functions, the user might create a struct column that contains a list of id's, a list of targets, and a score per target. Ref: https://pytorch.org/torcharrow/beta/functional.html

get_score_sum | Return the sum of all the scores in matching_id_scores that has a corresponding id in matching_ids that is also in input_ids.
get_score_min | Return the min among of all the scores in matching_id_scores that has a corresponding id in matching_ids that is also in input_ids.
get_score_max | Return the min among of all the scores in matching_id_scores that has a corresponding id in matching_ids that is also in input_ids.

JIT join
Currently libcudf uses mixed_join to fuse together hash join with post-filter. Mixed joins accept an AST predicate that is applied as thread-per-row when the probe table equality keys are found in the build table. Mixed joins have poor warp occupancy due to heavy register pressure, as a result of combined hash join and AST expression functionality into a single kernel.

One alternative would be to use code gen to check the post-equality predicate and JIT-compile the resulting kernel.

Improving JIT infrastructure

As part of expanding JIT functionality in libcudf, we will need better tools for tracking JIT-compilation time (NVIDIA/jitify#137). We will also need better tools for JIT cache management such as clearing and pre-populating. Collaboration with Spark-RAPIDS and other partners will be critical for success.

The text was updated successfully, but these errors were encountered:

vyasr · 2025-02-18T22:18:30Z

Note that we're already discussing large chunks of this idea in #17399 and #15366.

GregoryKimball added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. labels Feb 16, 2025

GregoryKimball added this to libcudf Feb 16, 2025

GregoryKimball moved this to Story Issue in libcudf Feb 16, 2025

lamarrr mentioned this issue Feb 18, 2025

Added Imbalanced Tree Benchmarks for Transforms #18032

Open

3 tasks

Matt711 assigned lamarrr Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Expand JIT functionality in libcudf #18023

[FEA] Expand JIT functionality in libcudf #18023

GregoryKimball commented Feb 16, 2025 •

edited

Loading

vyasr commented Feb 18, 2025

[FEA] Expand JIT functionality in libcudf #18023

[FEA] Expand JIT functionality in libcudf #18023

Comments

GregoryKimball commented Feb 16, 2025 • edited Loading

vyasr commented Feb 18, 2025

GregoryKimball commented Feb 16, 2025 •

edited

Loading