Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: adding syncs of import buffers in MultiVector #9117

Merged
merged 1 commit into from
May 8, 2021

Conversation

kddevin
Copy link
Contributor

@kddevin kddevin commented May 7, 2021

@trilinos/tpetra

Motivation

In Exawind/nalu-wind#858, @rcknaus reported export errors when TPETRA_ASSUME_CUDA_AWARE_MPI=0 after #8821 was merged.
Similar errors were exposed by tpetra/core/test/CrsMatrix/Bug8794.cpp.

In Bug8794.cpp, many multivector elements are imported, due to the dense rows in the test matrix.
With TPETRA_ASSUME_CUDA_AWARE_MPI=0, DistObject decides to do communication on host.
With long rows, the number of imports is large, so MultiVector decides to unpack on device.
However, the sync of the import buffer between the communication and unpack was missing.
This PR adds the sync.

I still need to investigate what changed in #8821 to cause this error. We should also set up nightly tests with TPETRA_ASSUME_CUDA_AWARE_MPI=0; Trilinos PR testing has TPETRA_ASSUME_CUDA_AWARE_MPI=1 (which, for Bug8794.cpp, meant communication and unpacking were done on device, so no problems). Adding unit test reproducer (separate from Bug8794.cpp) would be nice as well.

Stakeholder Feedback

Exawind/nalu-wind#858
@rcknaus
See Exawind/nalu-wind#858 (comment)

Testing

Tested on ascicgpu with TPETRA_ASSUME_CUDA_AWARE_MPI=0 and 1.

@kddevin kddevin changed the base branch from master to develop May 7, 2021 21:22
@kddevin kddevin self-assigned this May 7, 2021
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 4391
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_serial

  • Build Num: 1923
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_debug

  • Build Num: 2404
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 9738
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_cuda_10.1.105

  • Build Num: 1153
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_clang_10.0.0

  • Build Num: 2519
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off

  • Build Num: 150
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
PULLREQUEST_CDASH_TRACK Experimental
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 5118
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Using Repos:

Repo: TRILINOS (trilinos/Trilinos)
  • Branch: tpetra_9116
  • SHA: 3bfe974
  • Mode: TEST_REPO

Pull Request Author: kddevin

kddevin added a commit that referenced this pull request May 7, 2021
@kddevin kddevin added the AT: AUTOMERGE Causes the PR autotester to automatically merge the PR branch once approvals are completed label May 7, 2021
kddevin added a commit that referenced this pull request May 7, 2021
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 4391
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_serial

  • Build Num: 1923
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_debug

  • Build Num: 2404
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 9738
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_cuda_10.1.105

  • Build Num: 1153
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_clang_10.0.0

  • Build Num: 2519
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_cuda_10.1.105_uvm_off

  • Build Num: 150
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
PULLREQUEST_CDASH_TRACK Experimental
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 5118
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS pkg: Tpetra
PULLREQUESTNUM 9117
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tpetra_9116
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3bfe974
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA aad612b


CDash Test Results for PR# 9117.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ brian-kelley ]!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged

@trilinos-autotester trilinos-autotester merged commit 2bbed5a into develop May 8, 2021
@trilinos-autotester
Copy link
Contributor

Merge on Pull Request# 9117: IS A SUCCESS - Pull Request successfully merged

@trilinos-autotester trilinos-autotester removed the AT: AUTOMERGE Causes the PR autotester to automatically merge the PR branch once approvals are completed label May 8, 2021
@jjellio
Copy link
Contributor

jjellio commented May 10, 2021

@kddevin I think the testing should be running both Cuda-aware ON and OFF.
E.g.,
https://testing.sandia.gov/cdash/test/70227930 (has it off)

I can't navigate CDash for the life of me - if I search for an ATDM build with the Tpetra flag = 0, I get
https://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=65&value1=Trilinos-atdm-&field2=testoutput&compare2=95&value2=TPETRA_ASSUME_CUDA_AWARE_MPI%3D0

To avoid a full rebuild, the code is built once, but the testing runs twice (as I understand it).

I pushed pretty hard for this functionality, because EMPIRE has used Cuda-aware = OFF in several instances...

Hope that's somewhat helpful. (if it didn't help flag the issue you're having it seems it has failed to do its job though!)

jmgate pushed a commit to tcad-charon/Trilinos that referenced this pull request May 18, 2021
…s:develop' (956bc97).

* trilinos-develop:
  Piecewise constant grid transfers via semi-coarsening toggletransfer
  Anasazi: Removes HAVE_COMPLEX from complex-valued tests
  tpetra: Removed unused variable
  Tpetra:  enable the test
  tpetra:  unit test for trilinos#9116; follows trilinos#9117
jmgate pushed a commit to tcad-charon/Trilinos that referenced this pull request May 19, 2021
…s:develop' (956bc97).

* trilinos-develop: (37 commits)
  Tacho develop (trilinos#9150)
  STK: Snapshot 05-18-21 07:34
  Piecewise constant grid transfers via semi-coarsening toggletransfer
  Fix compiler warning
  van1-tx2: Add deprecation notice
  Automatic snapshot commit from seacas at 4cee371
  Anasazi: Removes HAVE_COMPLEX from complex-valued tests
  tpetra: Removed unused variable
  ATDM: Set CTEST_BUILD_NAME by default to env<ATDM_BUILD_NAME>-exp for 'dashboard' target
  Automatic snapshot commit from tribits at d0513d3
  Removed template keyword to fix build errors
  Addressed all of Karen's feedback, and made equivalent changes to D1 where relevant.
  Nearly done with Karen's feedback.
  Fixed boundary_size, and atomics.
  Changed procs_to_send argument of communication function to pass by reference for performance
  Tpetra:  enable the test
  tpetra:  unit test for trilinos#9116; follows trilinos#9117
  Switched D2 and PD2 to always use NB_BIT, as our previous implementation did.
  Change to Kokkos size views to fix build errors
  Further modified the serial exec/memory spaces to solve build problems
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this pull request May 19, 2021
…s:develop' (956bc97).

* trilinos-develop: (37 commits)
  Tacho develop (trilinos#9150)
  STK: Snapshot 05-18-21 07:34
  Piecewise constant grid transfers via semi-coarsening toggletransfer
  Fix compiler warning
  van1-tx2: Add deprecation notice
  Automatic snapshot commit from seacas at 4cee371
  Anasazi: Removes HAVE_COMPLEX from complex-valued tests
  tpetra: Removed unused variable
  ATDM: Set CTEST_BUILD_NAME by default to env<ATDM_BUILD_NAME>-exp for 'dashboard' target
  Automatic snapshot commit from tribits at d0513d3
  Removed template keyword to fix build errors
  Addressed all of Karen's feedback, and made equivalent changes to D1 where relevant.
  Nearly done with Karen's feedback.
  Fixed boundary_size, and atomics.
  Changed procs_to_send argument of communication function to pass by reference for performance
  Tpetra:  enable the test
  tpetra:  unit test for trilinos#9116; follows trilinos#9117
  Switched D2 and PD2 to always use NB_BIT, as our previous implementation did.
  Change to Kokkos size views to fix build errors
  Further modified the serial exec/memory spaces to solve build problems
  ...
seamill pushed a commit to seamill/Trilinos that referenced this pull request Jul 28, 2021
…develop' (4520086).

* trilinos/develop: (38 commits)
  Adds GCRODR / CG complex-valued tests for Tpetra
  Tacho develop (trilinos#9150)
  STK: Snapshot 05-18-21 07:34
  Piecewise constant grid transfers via semi-coarsening toggletransfer
  Fix compiler warning
  van1-tx2: Add deprecation notice
  Automatic snapshot commit from seacas at 4cee371
  Anasazi: Removes HAVE_COMPLEX from complex-valued tests
  tpetra: Removed unused variable
  ATDM: Set CTEST_BUILD_NAME by default to env<ATDM_BUILD_NAME>-exp for 'dashboard' target
  Automatic snapshot commit from tribits at d0513d3
  Removed template keyword to fix build errors
  Addressed all of Karen's feedback, and made equivalent changes to D1 where relevant.
  Nearly done with Karen's feedback.
  Fixed boundary_size, and atomics.
  Changed procs_to_send argument of communication function to pass by reference for performance
  Tpetra:  enable the test
  tpetra:  unit test for trilinos#9116; follows trilinos#9117
  Switched D2 and PD2 to always use NB_BIT, as our previous implementation did.
  Change to Kokkos size views to fix build errors
  ...
@kddevin kddevin deleted the tpetra_9116 branch November 1, 2021 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants