-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spmv cusparse tpl #618
Spmv cusparse tpl #618
Conversation
@lucbv Kokkos support is for cuda >= Edit: Mistake in Cuda version support |
Nathan: I think we are at >= 9.0 not 9.2 |
Edited my comment, thanks for clarification. |
@ndellingwood |
/* initialize cusparse library */ | ||
status = cusparseCreate(&handle); | ||
if (status != CUSPARSE_STATUS_SUCCESS) { | ||
throw("KokkosSparse::spmv[TPL_CUSPARSE,double]: cusparse was not initialized correctly"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please don't throw
const char[]
. Use a subclass ofstd::exception
. - Kokkos should have a function for reporting errors; please use that instead of throwing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once things are actually working (right now the spmv does not give the right answer), I'll look into the proper way of throwing exception or reporting errors. We must have some mechanism for that in other tpl interfaces, maybe in cublas_gemm interface for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhoemmen
I looked into Kokkos_Cuda_Error.hpp
and the main blocker to use CUDA_SAFE_CALL
is that it uses the type cudaError
when the cusparse calls are retunring cusparseStatus_t
instead. At this point I added better throws that use std::runtime_exception
.
I will add to the todo list the creation of a CUSPARSE_SAFE_CALL
macro that replicates mechanism demonstrated in Kokkos but for cusparse return type.
|
||
/* create matrix */ | ||
cusparseSpMatDescr_t A_cusparse; | ||
status = cusparseCreateCsr(&A_cusparse, A.numRows(), A.numCols(), A.nnz(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're exposing cuSPARSE functions in the header file :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually thinking of creating a small utility function that takes in a Kokkos::CrsMatrix
and returns a cusparseSpMatDescr_t
so part of this code will disappear down the road...
I will also look into other TPL specialization we have in kokkos-kernels and see how things are implemented. Depending on that I might move things around accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucbv Check out how I hid the TPL header file includes for cuSOLVER in tpetra/tsqr/src
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhoemmen as discussed I will outline a strategy to reduce the size of the macros in the KokkosSparse_spmv_tpl_spec_decl.hpp
and hide the header in an implementation file.
fcac9ff
to
0160cb5
Compare
Alright, this is now passing Whitecuda-10.1.105-Cuda_Serial-release build_time=288 run_time=270 Bowmanintel-16.4.258-Pthread-release build_time=523 run_time=296 Kokkos-dev####################################################### |
@srajama1 @ndellingwood any thoughts on this PR? |
@lucbv I haven't looked carefully, but wanted to check that using the deprecated cusparse matrix-vector interface ( |
@ndellingwood the tpl is guarded to make sure it only gets enabled for cuda 9 and up. |
@mhoemmen if you don't see any inconvenience with this I will update issue #618 to explain how I plan to address further your comments in a subsequent PR that will refactor this code. But I would like for the current version to be merged. I will re-run the spot-check since I did one more push for the exception handling. |
Supporting both cuda 9 interface and cuda 10.2 interface Support for float_int_int and double_int_int Could potentially support int64_t with cuda 10.2 interface. Modifying the spmv_struct_tunning test to make it compile appropriately.
@mhoemmen @ndellingwood @srajama1 White --spot-check####################################################### White --spot-check-tpls####################################################### Kokkos-dev --spot-check####################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucbv I added a couple comments, thanks for working on this!
@@ -30,7 +30,7 @@ | |||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, | |||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, | |||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR | |||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF | |||
// PROFIS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here, maybe accidental bump on the keyboard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, likely an inadvertent keystroke... easy to fix though
} \ | ||
}; | ||
|
||
KOKKOSSPARSE_SPMV_CUSPARSE(double, int, Kokkos::LayoutLeft, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is complex support to be added later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially but I am a bit worried about the alignment of cuComplex
and that of Kokkos::complex
for that reason I would like to defer this to allow careful testing when that feature goes in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to look a bit more carefully at the ETI TPL layer you added, but if complex is not enabled then does this mean kokkos-kernels spmv will use the existing fallback implementation? If not, that'll need to be addressed or else this will cause issues in Trilinos if anyone enables cusparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at KokkosSparse_spmv_tpl_spec_avail.hpp
you will see that the tpl is not made available for complex type so it will fall back to kokkos-kernel implementation of spmv in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndellingwood Let me second Luc's concern about alignment of complex. I remember y'all added an option to control alignment of Kokkos::complex
; we could use that to decide whether to enable complex support.
@lucbv approved, shall I go ahead and merge? |
@ndellingwood I just pushed minor changes to address your comments, let me know what you think of them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Luc!
cusparse_Acolind(cusparse_Acolind_) {}; | ||
|
||
void doCopy() { | ||
Kokkos::RangePolicy<execution_space, rowPtrTag> rowPtrPolicy(0, Arowptr.extent(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Tpetra::Details::copyOffsets
. That would be a useful kokkos-kernels utility.
@@ -101,7 +143,7 @@ void struct_matvec(const int stencil_type, | |||
const AMatrix& A, | |||
const XVector& x, | |||
typename YVector::const_value_type& beta, | |||
const YVector& y, | |||
YVector& y, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not const? It's a Kokkos::View.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could bring back the const, the intent is to show that this view could be modified, although its data is really what is being modified.
Acolind_cusparse[idx] = Acolind[idx]; | ||
}); | ||
cusparse_int_type Arowptr_cusparse("Arowptr", Arowptr.extent(0)); | ||
cusparse_int_type Acolind_cusparse("Acolind", Acolind.extent(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to allocate column indices unless they don't have type int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct but since this is a test I did not want to write too much logic around the type of the ordinal.
It would make the test a bit faster but this part is not timed and we already know that the cusparse interface is changing in future versions. Ideally this test should be rewritten down the road for the new interface that will make this a bit cleaner...
// using ordinal_type = typename AMatrix::non_const_ordinal_type; | ||
using value_type = typename AMatrix::non_const_value_type; | ||
|
||
#if defined(CUSPARSE_VERSION) && (10300 <= CUSPARSE_VERSION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was CUDA 10.1 that added the new interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a newer "patch" update to 10.1 that added the new interface, 10.1.patch, but is not included in all versions of 10.1 (in particular not the version we have a sems module for nightly testing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit complicated, but the bottom line is that cuda and cusparse do not necessarily have the same version numbering.
Release number | CUDA_VERSION | CUSPARSE_VERSION |
---|---|---|
10.1.168 | 10100 | undef |
10.1.243 | 10100 | 10300 |
Note that in release 10.1.168, cusparse does define the following macros:
#define CUSPARSE_VER_MAJOR 10
#define CUSPARSE_VER_MINOR 2
#define CUSPARSE_VER_PATCH 0
#define CUSPARSE_VER_BUILD 0
where as in 10.1.243 it defines
#define CUSPARSE_VER_MAJOR 10
#define CUSPARSE_VER_MINOR 3
#define CUSPARSE_VER_PATCH 0
#define CUSPARSE_VER_BUILD 243
#define CUSPARSE_VERSION (CUSPARSE_VER_MAJOR * 1000 + \
CUSPARSE_VER_MINOR * 100 + \
CUSPARSE_VER_PATCH)
/* Initialize cusparse */ | ||
cusparseStatus_t cusparseStatus; | ||
cusparseHandle_t cusparseHandle=0; | ||
cusparseStatus = cusparseCreate(&cusparseHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should time this -- it could be expensive to create this handle.
if (std::is_same<value_type,float>::value) { | ||
cusparseStatus = cusparseScsrmv(cusparseHandle, CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
A.numRows(), A.numCols(), A.nnz(), | ||
(const float *) &alpha, descrA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer reinterpret_cast
to C-style casts.
@mhoemmen do you request your comments be addressed prior to merge or as a follow up PR? |
@ndellingwood I'm good with this -- Luc and I talked earlier this week. Sorry to delay you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucbv : One comment. It would be nice to see how expensive handle creation is. As you pointed out it is good to make small PRs and keep incremental improvement. Can we also update the documentation ? I assume we have an issue that tracks this so @ndellingwood can document this in 3.1.
@mhoemmen just wanted to check, always valuable feedback and didn't want to bypass anything requiring immediate address that was overlooked
@srajama1 good call this would be useful, I'm going to go ahead with the merge (I think this is safe based on your follow up comment), this will get it through a round of nightlies (we have tpls in the nightly tests now :) |
Adding cusparse backend for SpMV to allow users to benefit from latest improvement in nvidia code.
This will potentially provide better performance across different GPU architectures as it is complicated to tune the SpMV kernel for all graphic cards and compute versions.
The support will be provided for cuda 9.2 and 10 to begin with.