-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MueLu: Add SemiCoarsenPFactory_kokkos #9107
MueLu: Add SemiCoarsenPFactory_kokkos #9107
Conversation
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jewatkins Thanks for contributing this code! It's great to see the use of batched BLAS. One question -- some of the loops in MueLu_SemiCoarsenPFactory_kokkos_def.hpp
haven't been converted. Is this just because they aren't expensive to begin with?
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
That's a good question. The short answer is, yes, they're not very expensive. The long answer is, I thought about converting all the loops and data structures but I figured it would take too long to do for not much benefit. Some loops are not thread safe. Here it helps that we're using CUDA UVM because I don't have to worry about running these loops on device. I also didn't want to spend too much time on the setup loops because I was envisioning doing a lot of this work during a "setup" phase (i.e. it would only need to be run once as long as the mesh/matrix dimensions don't change). I'm still curious about whether there's a path towards doing that in muelu but I don't think it's a high priority right now (at least for Albany). |
@jewatkins I did't look at the code yet, but please don't rely on UVM. Otherwise we'll have to fix that pretty soon. |
@cgcgcg Okay I was curious about that because I wasn't sure what the status was and how you guys were going to make the transition. Is the plan to require all kokkos code to work with DeviceType=Kokkos::Cuda? I should be able to modify this to work cuda too, I think it would just require being a bit more careful with memory space and deep copies. Is there currently anyway I can test it? |
@jewatkins Unfortunately, there might be a couple of other things that will break if you switch off UVM, so testing it is difficult. But I'd already be happy if the code doesn't obviously rely on UVM. If there is something subtle, we can fix it once our testing catches up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, so AT can run
I removed the UVM requirement from this and added a test based on @cgcgcg recommendations. I tested it against the tpetraCrsRefactor branch: e3bf162 and there was an issue with Xpetra |
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ cgcgcg ]! |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: jewatkins |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_8.3.0 # 4434 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_serial # 1965 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_debug # 2446 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 9781 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_10.1.105 # 1197 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_clang_10.0.0 # 2562 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_10.1.105_uvm_off # 194 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_3 # 5154 (click to expand)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next AT run
The uvm_off build failure is the same one I received on the tpetraCrsRefactor branch:
I think Xpetra::MultiVector::getLocalView needs to be fixed but it's not exactly clear to me what a fix would look like. Maybe someone else is planning to work on this, otherwise, I could take a closer look. |
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ cgcgcg ]! |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: jewatkins |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_8.3.0 # 4437 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_serial # 1968 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_debug # 2449 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 9784 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_10.1.105 # 1200 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_clang_10.0.0 # 2565 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_10.1.105_uvm_off # 197 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_3 # 5157 (click to expand)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: jewatkins |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_8.3.0 # 4439 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_serial # 1970 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_debug # 2451 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 9786 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_10.1.105 # 1202 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_clang_10.0.0 # 2567 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_10.1.105_uvm_off # 199 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_3 # 5158 (click to expand)
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get the AT running again
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
7 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
@jhux2 Thanks for the review, I just noticed it (I think my filter settings were too strict). I'll work through this today. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: python-3
Jenkins Parameters
Using Repos:
Pull Request Author: jewatkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. Thanks, @jewatkins!
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: python-3
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_8.3.0 # 5496 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_serial # 3055 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_debug # 3535 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 10778 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_10.1.105 # 2189 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_10.1.105_uvm_off # 1186 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_clang_10.0.0 # 3567 (click to expand)
Console Output (last 100 lines) : python-3 # 225 (click to expand)
|
Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job Trilinos_pullrequest_intel_17.0.1 to start: Total Wait = 603
|
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: python-3
Jenkins Parameters
Using Repos:
Pull Request Author: jewatkins |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: python-3
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ jhux2 ]! |
Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged |
Cannot Merge Pull Request# 9107: Github Mergeability = blocked; Git Mergeable = blocked |
changes addressed, re-reviewed by Jonathan
@trilinos/muelu
Motivation
This adds the kokkos version of SemiCoarsenPFactory in MueLu. This is needed to improve the performance of Albany Land Ice when running on GPUs. I've only tested this with Albany Land Ice and it seems to be working fine (same convergence, 8.5x speedup in building P, 1.4x speedup in preconditioner construction).
A review would be great too. One potential issue:
The original algorithm serially solves a batch (equal to the number of vertical lines, NVertLines) of banded linear systems using gbtrf/gbtrs. The closest thing I found to a kokkos equivalent was the kokkos-kernels batched LU (no pivot) and Trsv routines. This required storing NVertLines of the dense equivalent of the banded matrix which is quite large. This is probably okay for now but it could probably be improved.
Testing
If SemiCoarsenPFactory has tests, I'm assuming I could reuse them for this factory. Could someone point me to those tests and explain how to run them?