Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ifpack2: Plug in HTS for thread-parallel sparse triangular solve #583

Closed
mhoemmen opened this issue Aug 24, 2016 · 16 comments
Closed

Ifpack2: Plug in HTS for thread-parallel sparse triangular solve #583

mhoemmen opened this issue Aug 24, 2016 · 16 comments

Comments

@mhoemmen
Copy link
Contributor

@trilinos/ifpack2 @trilinos/tpetra

@mhoemmen
Copy link
Contributor Author

This depended on #514. Ifpack2::LocalSparseTriangularSolver is the right place to put this, NOT Tpetra::CrsMatrix::localSolve. See #581.

@egboman
Copy link
Contributor

egboman commented Aug 29, 2016

I suppose the triangular solve codes might live in KokkosKernels or ShyLu, but used by Ifpack2?

@mhoemmen
Copy link
Contributor Author

@egboman Main interface to local sparse triangular solves is in KokkosKernels. I'm wrapping MKL's sparse TRSM right now. Ifpack2::LocalSparseTriangularSolver just wraps the KokkosKernels interface.

@ambrad ambrad self-assigned this Oct 2, 2016
@ambrad
Copy link
Contributor

ambrad commented Dec 5, 2016

@mhoemmen, in RILUK::apply(), you wrote (Jan 2014): "C must have the same Map as D. We have to use a temp multivector, since our implementation of triangular solves does not allow its input and output to alias one another." However, I believe that is no longer true. CrsMatrix::localSolve's doc says: "...unlike multiply(), it is permissible for &X == &Y." Also, the tests pass without the temporary. Anything I'm missing? Thanks.

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Dec 5, 2016

@ambrad wrote:

... CrsMatrix::localSolve's doc says: "...unlike multiply(), it is permissible for &X == &Y."

Would that be true for a general parallel triangular solve, such as what HTS does?

I haven't had time to audit all the comments for correctness, alas ;-)

@ambrad
Copy link
Contributor

ambrad commented Dec 5, 2016

@mhoemmen, my view is that if a particular trisolver can't handle it, that should be handled in the impl details of LocalSparseTriangularSolver, not in RILUK. That's what I was about to start doing, but then I thought to check whether it's still true for CrsMatrix::localSolve. I concluded that, at present, neither option (CrsMatrix::localSolve or HTS) requires &X != &Y.

@ambrad
Copy link
Contributor

ambrad commented Dec 5, 2016

BTW, work for this issue is in this branch of my fork: https://github.com/ambrad/Trilinos/commits/hts1. (I'll squash it to one commit when I'm done.)

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Dec 5, 2016

@ambrad I don't particularly like the promise that aliasing works. &X == &Y does not suffice to check all possible aliasing cases (e.g., Tpetra::MultiVector that views a subset of columns of another Tpetra::MultiVector). Also, even if aliasing works for the local kernel, we might like to overlap communication and computation at the MPI level. That could affect whether aliasing is correct.

@ambrad
Copy link
Contributor

ambrad commented Dec 5, 2016

@mhoemmen, that's fine, but do you agree that LocalSparseTriangularSolver is the right place to manage aliasing? I.e., RILUK should not have to know whether a temp MV needs to be made; it should be able to call the Trisolver and let it figure it out.

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Dec 5, 2016

@ambrad wrote:

... do you agree that LocalSparseTriangularSolver is the right place to manage aliasing?

Yes, I agree. I don't think we should ask KokkosKernels or third-party implementations to do that. The Ifpack2 interface should handle it. (It already does, for some other solvers.)

@ambrad
Copy link
Contributor

ambrad commented Dec 7, 2016

@trilinos/ifpack2, @trilinos/shylu, I'm proposing to push the commit linked above. It is the first of I anticipate >=3 commits focusing on Ifpack2::LocalSparseTriangularSolver. @mhoemmen and I might still discuss some interface issues re: aliasing in LSTS, but I propose that I push this commit, anyway, since it gets a fair bit done, and include further decisions on aliasing in a future commit. Thoughts? Objections to my pushing this commit?

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Dec 7, 2016

@ambrad I think it's OK to push for now. The implementation eventually should tie the sparse triangular solve implementation to MatrixType::execution _space (e.g., cuSPARSE for Cuda, HTS for OpenMP, "Internal" for Serial). Thanks!

ambrad added a commit that referenced this issue Dec 7, 2016
This commit is the first of a planned long-term sequence to integrate NGP
triangular solvers (trisolvers) into Ifpack2 through
Ifpack2::LocalSparseTriangularSolver (LSTS).

In this commit, ShyLU/HTS is integrated for forward solves. Since RILUK already
uses LSTS, it can use HTS for its apply() phase.

As part of this commit, RILUK was cleaned up to:
  * use LSTS more efficiently (retain L and U solver objects rather than
    repeatedly allocate them);
  * no longer use a temporary multivector in apply(). LSTS will handle alising
    of X and Y.
In addition, HTS was modified to handle an implicit unit diagonal.

Unit tests were modified for HTS and for LSTS to cover the new features. The
Belos-RILUK tests were duplicated to run with HTS.

Build/Test Cases Summary
Enabled Packages: Ifpack2, ShyLUHTS
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=32,notpassed=0 (6.02 min)
1) MPI_RELEASE_DEBUG_SHARED_PT_COMPLEX => passed: passed=32,notpassed=0 (23.85 min)
@srajama1
Copy link
Contributor

@ambrad : Can we close this ? The issue is quite broad, but I assume the primary target is integrating HTS . Integrating cuSPARSE or an internal one is planned later this year and can be addressed in a separate issue.

@ambrad
Copy link
Contributor

ambrad commented Sep 27, 2017

I agree, @srajama1. I will close this now, with GPU deliverables to go into a new issue.

@ambrad ambrad closed this as completed Sep 27, 2017
@ambrad ambrad reopened this Sep 27, 2017
@ambrad
Copy link
Contributor

ambrad commented Sep 27, 2017

Sorry, we should ask @mhoemmen first. @mhoemmen, close this issue if you agree with putting GPU trisolver work into a new issue.

@mhoemmen mhoemmen changed the title Ifpack2: Plug in HTS, cuSPARSE, etc. for thread-parallel sparse triangular solve Ifpack2: Plug in HTS for thread-parallel sparse triangular solve Sep 28, 2017
@mhoemmen
Copy link
Contributor Author

@ambrad I agree; thanks! The new issue (for GPU sparse triangular solve) is #1790.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants