-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: adding syncs of import buffers in MultiVector #9117
Conversation
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: kddevin |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ brian-kelley ]! |
Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged |
Merge on Pull Request# 9117: IS A SUCCESS - Pull Request successfully merged |
@kddevin I think the testing should be running both Cuda-aware ON and OFF. I can't navigate CDash for the life of me - if I search for an ATDM build with the Tpetra flag = 0, I get To avoid a full rebuild, the code is built once, but the testing runs twice (as I understand it). I pushed pretty hard for this functionality, because EMPIRE has used Cuda-aware = OFF in several instances... Hope that's somewhat helpful. (if it didn't help flag the issue you're having it seems it has failed to do its job though!) |
…s:develop' (956bc97). * trilinos-develop: Piecewise constant grid transfers via semi-coarsening toggletransfer Anasazi: Removes HAVE_COMPLEX from complex-valued tests tpetra: Removed unused variable Tpetra: enable the test tpetra: unit test for trilinos#9116; follows trilinos#9117
…s:develop' (956bc97). * trilinos-develop: (37 commits) Tacho develop (trilinos#9150) STK: Snapshot 05-18-21 07:34 Piecewise constant grid transfers via semi-coarsening toggletransfer Fix compiler warning van1-tx2: Add deprecation notice Automatic snapshot commit from seacas at 4cee371 Anasazi: Removes HAVE_COMPLEX from complex-valued tests tpetra: Removed unused variable ATDM: Set CTEST_BUILD_NAME by default to env<ATDM_BUILD_NAME>-exp for 'dashboard' target Automatic snapshot commit from tribits at d0513d3 Removed template keyword to fix build errors Addressed all of Karen's feedback, and made equivalent changes to D1 where relevant. Nearly done with Karen's feedback. Fixed boundary_size, and atomics. Changed procs_to_send argument of communication function to pass by reference for performance Tpetra: enable the test tpetra: unit test for trilinos#9116; follows trilinos#9117 Switched D2 and PD2 to always use NB_BIT, as our previous implementation did. Change to Kokkos size views to fix build errors Further modified the serial exec/memory spaces to solve build problems ...
…s:develop' (956bc97). * trilinos-develop: (37 commits) Tacho develop (trilinos#9150) STK: Snapshot 05-18-21 07:34 Piecewise constant grid transfers via semi-coarsening toggletransfer Fix compiler warning van1-tx2: Add deprecation notice Automatic snapshot commit from seacas at 4cee371 Anasazi: Removes HAVE_COMPLEX from complex-valued tests tpetra: Removed unused variable ATDM: Set CTEST_BUILD_NAME by default to env<ATDM_BUILD_NAME>-exp for 'dashboard' target Automatic snapshot commit from tribits at d0513d3 Removed template keyword to fix build errors Addressed all of Karen's feedback, and made equivalent changes to D1 where relevant. Nearly done with Karen's feedback. Fixed boundary_size, and atomics. Changed procs_to_send argument of communication function to pass by reference for performance Tpetra: enable the test tpetra: unit test for trilinos#9116; follows trilinos#9117 Switched D2 and PD2 to always use NB_BIT, as our previous implementation did. Change to Kokkos size views to fix build errors Further modified the serial exec/memory spaces to solve build problems ...
…develop' (4520086). * trilinos/develop: (38 commits) Adds GCRODR / CG complex-valued tests for Tpetra Tacho develop (trilinos#9150) STK: Snapshot 05-18-21 07:34 Piecewise constant grid transfers via semi-coarsening toggletransfer Fix compiler warning van1-tx2: Add deprecation notice Automatic snapshot commit from seacas at 4cee371 Anasazi: Removes HAVE_COMPLEX from complex-valued tests tpetra: Removed unused variable ATDM: Set CTEST_BUILD_NAME by default to env<ATDM_BUILD_NAME>-exp for 'dashboard' target Automatic snapshot commit from tribits at d0513d3 Removed template keyword to fix build errors Addressed all of Karen's feedback, and made equivalent changes to D1 where relevant. Nearly done with Karen's feedback. Fixed boundary_size, and atomics. Changed procs_to_send argument of communication function to pass by reference for performance Tpetra: enable the test tpetra: unit test for trilinos#9116; follows trilinos#9117 Switched D2 and PD2 to always use NB_BIT, as our previous implementation did. Change to Kokkos size views to fix build errors ...
@trilinos/tpetra
Motivation
In Exawind/nalu-wind#858, @rcknaus reported export errors when TPETRA_ASSUME_CUDA_AWARE_MPI=0 after #8821 was merged.
Similar errors were exposed by tpetra/core/test/CrsMatrix/Bug8794.cpp.
In Bug8794.cpp, many multivector elements are imported, due to the dense rows in the test matrix.
With TPETRA_ASSUME_CUDA_AWARE_MPI=0, DistObject decides to do communication on host.
With long rows, the number of imports is large, so MultiVector decides to unpack on device.
However, the sync of the import buffer between the communication and unpack was missing.
This PR adds the sync.
I still need to investigate what changed in #8821 to cause this error. We should also set up nightly tests with TPETRA_ASSUME_CUDA_AWARE_MPI=0; Trilinos PR testing has TPETRA_ASSUME_CUDA_AWARE_MPI=1 (which, for Bug8794.cpp, meant communication and unpacking were done on device, so no problems). Adding unit test reproducer (separate from Bug8794.cpp) would be nice as well.
Stakeholder Feedback
Exawind/nalu-wind#858
@rcknaus
See Exawind/nalu-wind#858 (comment)
Testing
Tested on ascicgpu with TPETRA_ASSUME_CUDA_AWARE_MPI=0 and 1.