-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ifpack2: Plug in HTS for thread-parallel sparse triangular solve #583
Comments
I suppose the triangular solve codes might live in KokkosKernels or ShyLu, but used by Ifpack2? |
@egboman Main interface to local sparse triangular solves is in KokkosKernels. I'm wrapping MKL's sparse TRSM right now. Ifpack2::LocalSparseTriangularSolver just wraps the KokkosKernels interface. |
@mhoemmen, in RILUK::apply(), you wrote (Jan 2014): "C must have the same Map as D. We have to use a temp multivector, since our implementation of triangular solves does not allow its input and output to alias one another." However, I believe that is no longer true. CrsMatrix::localSolve's doc says: "...unlike multiply(), it is permissible for &X == &Y." Also, the tests pass without the temporary. Anything I'm missing? Thanks. |
@ambrad wrote:
Would that be true for a general parallel triangular solve, such as what HTS does? I haven't had time to audit all the comments for correctness, alas ;-) |
@mhoemmen, my view is that if a particular trisolver can't handle it, that should be handled in the impl details of LocalSparseTriangularSolver, not in RILUK. That's what I was about to start doing, but then I thought to check whether it's still true for CrsMatrix::localSolve. I concluded that, at present, neither option (CrsMatrix::localSolve or HTS) requires &X != &Y. |
BTW, work for this issue is in this branch of my fork: https://github.com/ambrad/Trilinos/commits/hts1. (I'll squash it to one commit when I'm done.) |
@ambrad I don't particularly like the promise that aliasing works. |
@mhoemmen, that's fine, but do you agree that LocalSparseTriangularSolver is the right place to manage aliasing? I.e., RILUK should not have to know whether a temp MV needs to be made; it should be able to call the Trisolver and let it figure it out. |
@ambrad wrote:
Yes, I agree. I don't think we should ask KokkosKernels or third-party implementations to do that. The Ifpack2 interface should handle it. (It already does, for some other solvers.) |
@trilinos/ifpack2, @trilinos/shylu, I'm proposing to push the commit linked above. It is the first of I anticipate >=3 commits focusing on Ifpack2::LocalSparseTriangularSolver. @mhoemmen and I might still discuss some interface issues re: aliasing in LSTS, but I propose that I push this commit, anyway, since it gets a fair bit done, and include further decisions on aliasing in a future commit. Thoughts? Objections to my pushing this commit? |
@ambrad I think it's OK to push for now. The implementation eventually should tie the sparse triangular solve implementation to MatrixType::execution _space (e.g., cuSPARSE for Cuda, HTS for OpenMP, "Internal" for Serial). Thanks! |
This commit is the first of a planned long-term sequence to integrate NGP triangular solvers (trisolvers) into Ifpack2 through Ifpack2::LocalSparseTriangularSolver (LSTS). In this commit, ShyLU/HTS is integrated for forward solves. Since RILUK already uses LSTS, it can use HTS for its apply() phase. As part of this commit, RILUK was cleaned up to: * use LSTS more efficiently (retain L and U solver objects rather than repeatedly allocate them); * no longer use a temporary multivector in apply(). LSTS will handle alising of X and Y. In addition, HTS was modified to handle an implicit unit diagonal. Unit tests were modified for HTS and for LSTS to cover the new features. The Belos-RILUK tests were duplicated to run with HTS. Build/Test Cases Summary Enabled Packages: Ifpack2, ShyLUHTS 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=32,notpassed=0 (6.02 min) 1) MPI_RELEASE_DEBUG_SHARED_PT_COMPLEX => passed: passed=32,notpassed=0 (23.85 min)
@ambrad : Can we close this ? The issue is quite broad, but I assume the primary target is integrating HTS . Integrating cuSPARSE or an internal one is planned later this year and can be addressed in a separate issue. |
I agree, @srajama1. I will close this now, with GPU deliverables to go into a new issue. |
@trilinos/ifpack2 @trilinos/tpetra
The text was updated successfully, but these errors were encountered: