-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Improvements in feature sampling #4278
[REVIEW] Improvements in feature sampling #4278
Conversation
Any ideas we could also add support for feature subsampling with weights in the same PR? Or better to keep it separate in another PR? |
Weighted sampling should be possible. I'll see if I can manage to add it to same PR. |
This PR has been labeled |
rerun tests |
The above benchmarks have been rerun after the latest commit and modified in-place |
rerun tests |
@teju85 could you give a final review? ✌🏻 |
I already had a look at the code. I am okay with the changes. I am not approving as some part of code is written by me and I have been part of PR from the start. @tfeher Let us know if we need to explain the changes for you to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @venkywonka and @vinaydes for the PR! In general it looks good, there are only a few smaller issues.
cpp/src/decisiontree/batched-levelalgo/kernels/builder_kernels.cuh
Outdated
Show resolved
Hide resolved
@tfeher I think you need to re-review or accept the changes, cause the merging is blocked for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vinaydes for addressing the issues!
I think my request for improving the docstring was not clear, I have added a suggestion to illustrate what did I mean.
On one hand this is a nitpick, and it should not hold up this PR, therfore I am approving the PR.
On the other hand, if someone picks up rapidsai/raft#767, then such information could be very useful.
cpp/src/decisiontree/batched-levelalgo/kernels/builder_kernels.cuh
Outdated
Show resolved
Hide resolved
….cuh Co-authored-by: Tamas Bela Feher <[email protected]>
I'll take a look at CI failures. |
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #4278 +/- ##
===============================================
Coverage ? 78.02%
===============================================
Files ? 180
Lines ? 11385
Branches ? 0
===============================================
Hits ? 8883
Misses ? 2502
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
@gpucibot merge |
Yaaaay ❤️🥳 |
With this PR, the feature sampling overhead is greatly reduced, especially for wide (thousands of features) datasets. The PR requires some structural changes in RAFT therefore is marked as WIP. Authors: - Vinay Deshpande (https://github.com/vinaydes) - Ray Douglass (https://github.com/raydouglass) - Andy Adinets (https://github.com/canonizer) - Jordan Jacobelli (https://github.com/Ethyling) - Jiwei Liu (https://github.com/daxiongshu) - GALI PREM SAGAR (https://github.com/galipremsagar) - Christopher Akiki (https://github.com/cakiki) - Venkat (https://github.com/venkywonka) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4278
With this PR, the feature sampling overhead is greatly reduced, especially for wide (thousands of features) datasets. The PR requires some structural changes in RAFT therefore is marked as WIP.