[FEA] JIT Compilation of CUDF Kernels #17399

lamarrr · 2024-11-21T13:10:08Z

Is your feature request related to a problem? Please describe.
As described in #15366, we intend to adopt JIT compilation to some of our kernels using JITify/NVRTC.
JITify presently only supports compilation of device code, this means we can't mix host code with it, like we'd do with NVCC. Some important headers are not supported, i.e.

<stdexcept>
<atomic> (replace with cuda/std/atomic)

Describe the solution you'd like
We'd need to:

Separate headers required in kernel code into device-only and host-only code headers
Patch Thrust and RMM headers to separate their headers into device-only code headers

Describe alternatives you've considered

Using macros to separate the device-only code from host code, this doesn't work and would have been very brittle
Depending on NVCC for PTX and loading it into the driver at runtime

The text was updated successfully, but these errors were encountered:

jrhemstad · 2024-11-21T21:10:22Z

Patch Thrust and RMM headers to separate their headers into device-only code headers

Please no. RAPIDS is banned from applying any more patches to Thrust headers 🙂

You should only include RMM or Thrust headers in the host-only headers, so there shouldn't be any need to modify those.

lamarrr · 2024-11-25T16:25:18Z

Please no. RAPIDS is banned from applying any more patches to Thrust headers 🙂

Even locally within cudf?

You should only include RMM or Thrust headers in the host-only headers, so there shouldn't be any need to modify those.

Alright, I'll investigate that

jrhemstad · 2024-11-25T17:28:57Z

Please no. RAPIDS is banned from applying any more patches to Thrust headers 🙂

Even locally within cudf?

Yes. Local patches to CCCL code causes all sorts of problems as the patch files need to be updated anytime any line of code in the patched file is updated. CCCL runs RAPIDS in its CI and patches become a nightmare.

vyasr · 2024-12-02T18:12:40Z

For CCCL in particular we are aiming to get all patches that we have historically needed upstreamed so that we can rely on CCCL's CI like Jake mentioned. Also more generally we do not want to ship patched libraries any more. It causes loads of unrelated potential packaging problems down the line.

lamarrr · 2025-01-13T15:03:24Z

After meeting with @jrhemstad and @robertmaynard, We'll be having the following next steps:

Immediate Exploration: Driver PTX-JIT

We'll first explore driver PTX-JIT compilation (per-module lazy loading) and evaluate the performance overhead and startup cost. This isn't optimal and we've previously avoided this as we'd be compiling for the lowest common denominator architecture (i.e sm_60) and thus, leaving some performance on the table. Implementing this would be quick to do and will be done at the CMake and/or preprocessor level.
If the mixed-joins compilation and runtime overhead is non-satisfactory we could also try separating the template instantiations into different translation units and JIT-compile and lazy-load each instantiation.

Metrics to measure:

Driver PTX-JIT time
Throughput difference
Binary size difference

Future Exploration: LTO-IR

LTO-IR as described in the CUDA parallel developer overview and the NVIDIA Developer Blog. Using LTO-IR has the advantage that we can use the full instruction set of the target architecture while also avoiding costs associated with driver PTX-JIT.

lamarrr added the feature request New feature or request label Nov 21, 2024

lamarrr mentioned this issue Jan 17, 2025

PTX-JIT compilation for mixed-join kernels #17763

Draft

3 tasks

vyasr mentioned this issue Feb 18, 2025

[FEA] Expand JIT functionality in libcudf #18023

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] JIT Compilation of CUDF Kernels #17399

[FEA] JIT Compilation of CUDF Kernels #17399

lamarrr commented Nov 21, 2024 •

edited

Loading

jrhemstad commented Nov 21, 2024

lamarrr commented Nov 25, 2024

jrhemstad commented Nov 25, 2024

vyasr commented Dec 2, 2024

lamarrr commented Jan 13, 2025 •

edited

Loading

[FEA] JIT Compilation of CUDF Kernels #17399

[FEA] JIT Compilation of CUDF Kernels #17399

Comments

lamarrr commented Nov 21, 2024 • edited Loading

jrhemstad commented Nov 21, 2024

lamarrr commented Nov 25, 2024

jrhemstad commented Nov 25, 2024

vyasr commented Dec 2, 2024

lamarrr commented Jan 13, 2025 • edited Loading

Immediate Exploration: Driver PTX-JIT

Future Exploration: LTO-IR

lamarrr commented Nov 21, 2024 •

edited

Loading

lamarrr commented Jan 13, 2025 •

edited

Loading