-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ifpack2: Fused block jacobi #13837
Open
brian-kelley
wants to merge
1
commit into
trilinos:develop
Choose a base branch
from
brian-kelley:FusedBlockJacobiFinal
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Ifpack2: Fused block jacobi #13837
+1,132
−59
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
More performant path for block Jacobi case inside BTDS (GPU only, BlockCrs only). Fuses residual and solve into one kernel and doesn't convert vectors to SIMD-packed format. Also inverts diag blocks fully in shared to speed up numeric. Signed-off-by: Brian Kelley <[email protected]>
1d16296
to
29fe448
Compare
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
client: SPARC
Issues related to or needed more specifically by the ATDM SPARC code
impacting: performance
pkg: Ifpack2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
More performant paths for block Jacobi case inside BTDS (GPU only, BlockCrs only).
Compared to #13805, this gives between 9% (bs = 7) and 33% (bs = 11) speedup on the overall solve and about 1.9x speedup in numeric setup for both block sizes. Note: this was measured on a single GPU run, so the speedup only applies to local computation. Multi-rank runs will speed up less due to time spent doing communication.
@trilinos/ifpack2
Related Issues
Follows #13805
Stakeholder Feedback
Will ask SPARC team to evaluate.
Testing
Tested in Ifpack2_BlockTriDiContainerUnitAndPerfTests, and ran this on OpenMP, Cuda and HIP. For Cuda tested double, float and complex_double.