-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] JIT Compilation of CUDF Kernels #17399
Comments
Please no. RAPIDS is banned from applying any more patches to Thrust headers 🙂 You should only include RMM or Thrust headers in the host-only headers, so there shouldn't be any need to modify those. |
Even locally within cudf?
Alright, I'll investigate that |
Yes. Local patches to CCCL code causes all sorts of problems as the patch files need to be updated anytime any line of code in the patched file is updated. CCCL runs RAPIDS in its CI and patches become a nightmare. |
For CCCL in particular we are aiming to get all patches that we have historically needed upstreamed so that we can rely on CCCL's CI like Jake mentioned. Also more generally we do not want to ship patched libraries any more. It causes loads of unrelated potential packaging problems down the line. |
After meeting with @jrhemstad and @robertmaynard, We'll be having the following next steps: Immediate Exploration: Driver PTX-JITWe'll first explore driver PTX-JIT compilation (per-module lazy loading) and evaluate the performance overhead and startup cost. This isn't optimal and we've previously avoided this as we'd be compiling for the lowest common denominator architecture (i.e sm_60) and thus, leaving some performance on the table. Implementing this would be quick to do and will be done at the CMake and/or preprocessor level. Metrics to measure:
Future Exploration: LTO-IRLTO-IR as described in the CUDA parallel developer overview and the NVIDIA Developer Blog. Using LTO-IR has the advantage that we can use the full instruction set of the target architecture while also avoiding costs associated with driver PTX-JIT. |
Is your feature request related to a problem? Please describe.
As described in #15366, we intend to adopt JIT compilation to some of our kernels using JITify/NVRTC.
JITify presently only supports compilation of device code, this means we can't mix host code with it, like we'd do with NVCC. Some important headers are not supported, i.e.
<stdexcept>
<atomic>
(replace with cuda/std/atomic)Describe the solution you'd like
We'd need to:
Describe alternatives you've considered
The text was updated successfully, but these errors were encountered: