-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Use num internal streams instead of creating cumlHandle's inside the C++ layer #1015
[REVIEW] Use num internal streams instead of creating cumlHandle's inside the C++ layer #1015
Conversation
Folks, the set of changes so far will for sure break the python world. Will fix that soon. |
wiki/cpp/DEVELOPER_GUIDE.md
Outdated
@@ -259,6 +259,8 @@ void foo(const ML::cumlHandle_impl& h, ...) | |||
} | |||
``` | |||
|
|||
An example of how to use internal streams to schedule work on a single GPU can be found in [here](https://github.com/rapidsai/cuml/pull/1015). This PR uses the internal streams inside `cumlHandle_impl` to schedule more work onto the GPU for Random Forest building. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should follow the format we've been using for the rest of the developer guide and provide the example in place. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Done. Can you check now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall. Couple small comments and one nickpick about the Developer Guide link
I think I have addressed all the review comments. @vishalmehta1991 and @cjnolet please check now. Also, @vishalmehta1991 had concerns about the conflicts arising due to this PR and his PR #961 . We should discuss how to resolve this before merging either of the two. |
IMO, it is better to take the changes in #961 first, followed by me resolving conflicts arising with the current PR. |
@teju85 added the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @vishalmehta1991's PR has been merged so we should be good to go with this, once the conflicts and the dask-cuda issues are resolved.
…ea-ext-expose-num-internal-streams
having the same issue as the other PR #823. |
rerun tests |
…ea-ext-expose-num-internal-streams
…com/teju85/cuml into fea-ext-expose-num-internal-streams
JFYI, @vishalmehta1991 has requested to stop merging this PR until the PR #1087 gets through. |
@dantegd any ideas why I get the following error in CI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a more thorough review of the changes to the developer guide and have a few notes.
wiki/cpp/DEVELOPER_GUIDE.md
Outdated
@@ -6,6 +6,7 @@ Please start by reading [CONTRIBUTING.md](../../CONTRIBUTING.md). | |||
|
|||
## Performance | |||
1. In performance critical sections of the code, favor `cudaDeviceGetAttribute` over `cudaDeviceGetProperties`. See PR [#973](https://github.com/rapidsai/cuml/pull/973) for more details. | |||
2. If an algo requires you to launch GPU work in multiple cuda streams, do not create multiple `cumlHandle` objects, one for each such work stream. Instead, expose a `n_streams` parameter in that algo's cuML C++ interface and then rely on `cumlHandle_impl::getInternalStream()` to pick up the right cuda stream. See PR [#1015](https://github.com/rapidsai/cuml/pull/1015) and also the section on [CUDA Resources](#cuda-resources) for more details. TIP: use `cumlHandle_impl::getNumInternalStreams()` to know how many such streams are available at your disposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I missed this. I'd prefer not to point users to pull requests in the developer guide as it's not straightforward and can quickly get out of date as code is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you recommend instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the link to CUDA Resources and the TIP are good enough. Maybe we could also link to the example in the threading
section. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. How about now?
…ea-ext-expose-num-internal-streams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll add an in-place example to the use of the new internal streams API in the updates to the threading section of the developer guide.
…ea-ext-expose-num-internal-streams
…com/teju85/cuml into fea-ext-expose-num-internal-streams
rerun tests |
1 similar comment
rerun tests |
@teju85, I see what's going on here. The problem is not because you have a non-default Here's the code that's giving the problem:
The fix is to pass |
If we want to enable the sharing of a cumlHandle on the workers across different runs of algorithms (which will get tricky using NCCL in the comms), we will want to cache the handle on the workers (look into |
Awesome. Thanks @cjnolet for getting this PR finally across the border! |
…nal-streams [REVIEW] Use num internal streams instead of creating cumlHandle's inside the C++ layer
This PR is to showcase a possible solution for the issue #931.
However, for this to happen, the constructor for
cumlHandle_impl
had to be updated to expose num-streams parameter.Tagging @cjnolet @JohnZed and @vishalmehta1991 for review.