-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MueLu: Add CuSparse support to matvec driver #4672
MueLu: Add CuSparse support to matvec driver #4672
Conversation
The ExplicitInstantiation header was not being included, which made all --node commands fail only the default could work, because it didn't rely on MueLu's ETI macros
This provides HAVE_MUELU_CUSPARSE if the TPL is enabled.
Provide output for error norms and commandline flag to control
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Using Repos:
Pull Request Author: jjellio |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
// const double *beta, | ||
// double *y) | ||
cusparseStatus_t rc; | ||
if(Kokkos::Impl::is_same<Scalar,double>::value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trilinos is only tested with C++11 now, so please use std::is_same
-- thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of stuff actually goes away. I made this into a class, which allows things like specializations (as seen below). Doing that removes the need to do this all together.
|
||
auto X_lcl = X.template getLocalView<device_type> (); | ||
auto Y_lcl = Y.template getLocalView<device_type> (); | ||
x = (Scalar*) X_lcl.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use reinterpret_cast
instead of C-style casts -- thanks!
m = (int) A.getNodeNumRows(); | ||
n = (int) A.getNodeNumCols(); | ||
nnz = (int) Acolind_cusparse.extent(0); | ||
vals = (Scalar*) Avals.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use reinterpret_cast
instead of C-style casts -- thanks!
Arowptr_cusparse = cusparse_int_type("Arowptr", Arowptr.extent(0)); | ||
Acolind_cusparse = cusparse_int_type("Acolind", Acolind.extent(0)); | ||
// copy the ordinals into the local view (type conversion) | ||
copy_view(Arowptr,Arowptr_cusparse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please use the same convention as
Kokkos::deep_copy
for which argument is the target of the deep copy. - See
Tpetra_Details_copyOffsets.hpp
for an implementation of this functionality that checks for overflow. This code assumes Tpetra anyway, so you can just include that header file and use the function. - You're copying, so you could allocate without fill.
Acolind
is always a View ofint
anyway, so consider just using it directly.
vector_type& Y) | ||
{} | ||
|
||
~CuSparse_SpmV_Pack() {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write ~CuSparse_SpmV_Pack() = default;
instead.
cublasStatus_t cublasStatus; | ||
cublasStatus = cublasCreate(&cublasHandle); | ||
|
||
//checkCudaErrors(cublasStatus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just use the usual CUDA macro here that checks the return value of these functions for you.
cusparseStatus_t spmv(const Scalar alpha, const Scalar beta) { return CUSPARSE_STATUS_SUCCESS; } | ||
}; | ||
|
||
template<typename LocalOrdinal, typename GlobalOrdinal> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to use Scalar instead of double throughout this class, why not just template the class on Scalar and do a partial specialization for Scalar=double
?
I wonder if I broke pre-merge inspection, when I accidentally clicked 'project' rather than 'label' and put this into to the MueLu 'project'. I realized the folly of my ways and deleted that. Doing this marked 'MueLu' as a reviewer, which wasn't my intent. #4679 |
@jjellio Not your fault. As per @jwillenbring the per-merge inspection, is done via the same queueing system as the PR tester. Basically, we have to wait until our "lane" clears. If this hasn't inspected by tomorrow AM, @jwillenbring will take a look and see what broke in the inspection. |
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ csiefer2 ]! |
Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged |
Merge on Pull Request# 4672: IS A SUCCESS - Pull Request successfully merged |
@jwillenbring All good, thanks! |
@trilinos/muelu
Description
Provides CUSPARSE (TPL) access in the MatvecDriver
Also corrects a bug in MueLu's ETI helper ... which surely needs to be put through the PR tester.
The MueLu Helper was not including the MueLu ExplicitInstantiation header (generated), which caused the logic that creates --node=cuda|openmp|... to fail
Motivation and Context
Allow side by side comparisons against CUSPARSE
How Has This Been Tested?
Tested on Waterman
Checklist
@cgcgcg @csiefer2