-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cuStreamSync for async cusolver functs #215
Conversation
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
One solution would be to only sync the stream used by the interop
Would this achieve asynchronous submissions?
If we go this route it may be roll the wait call into
Interesting. Can you provide a reproducer? |
Actually I think just from the observation that it fixes the test failures it must be effectively blocking future submissions (at least those that are touching the same memory as the cusolver function) until the native stream used in the cusolver functions is finished working (It may be that we observe this blocking behaviour because the context could be being created with the
The reproducer is the getrf tests that are calling the getrf function that uses depend_on here: https://github.com/oneapi-src/oneMKL/blob/61312ed98b8208999f99474778d46919c30ef15b/src/lapack/backends/cusolver/cusolver_lapack.cpp#L1350 If the depends_on was syncing the stream then the corresponding tests wouldn't fail. |
Signed-off-by: JackAKirk <[email protected]>
I've now added blocking waits (using |
I've found out that these cusolver functions are apparently asynchronous, even though the Nvidia documentations implies that they are synchronous: therefore I think that Therefore I will update the changes made to synchronize the native stream within the host_task and then use |
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
I've done this now. |
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
@AidanBeltonS could you check this is all OK? Thanks |
LGTM. |
@ericlars what do you think this solution? This means the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delayed response, I've been on an extended vacation. This looks like a really elegant solution, thanks for working on it. I have a better appreciation for the difficulties of asynchronicity and cuda now.
attaching log: log_llvm_cusolver_.txt |
Signed-off-by: JackAKirk [email protected]
Description
This is a bug fix for failures first identified since the multi-streams implementation of the cuda backend in intel/llvm (failures identified here #209 (comment)):
The failed tests are due to the lack of a stream synchronisation after some cusolver interop functions such as
cusolverdnsgetrf
are called fromlapack::cusolver::getrf
. Since before the multistreams implementation all queues were effectively in-order using the cuda backend of intel/llvm, syncing streams returned from a queue that did not have thein_order
queue property was not necessary.The fix is to call:
cudaStream_t currentStreamId; CUSOLVER_ERROR_FUNC(cusolverDnGetStream, err, handle, ¤tStreamId); cuStreamSynchronize(currentStreamId);
after the cusolver functions. Since some cusolver functions are apparently asynchronous (and we can't know for sure from the docs which if any are not asynchronous), we have to synchronize the stream after it is used in cusolver calls.