Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up and document handling of CUDA-aware MPI with Tpetra (CDOFA-100, #6902) #6904

Conversation

bartlettroscoe
Copy link
Member

Turns out that the tests in the "cuda-aware-mpi" builds on 'vortex' were not actually setting TPETRA_ASSUME_CUDA_AWARE_MPI=1 (see CDOFA-100). This commit fixes that and it also documents for users how to run the test suite with and without CUDA-aware MPI in Tpetra (#6902).

How was this tested?

On 'vortex' I ran:

$ env Trilinos_PACKAGES=Tpetra \
  ./ctest-s-local-test-driver.sh ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt

...

Tue Feb 25 19:10:27 MST 2020

Running Jenkins driver Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt.sh ...

    See log file Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt/smart-jenkins-driver.out

real    18m31.319s
user    0m3.942s
sys     0m1.288s

100% tests passed, 0 tests failed out of 234
100% tests passed, 0 tests failed out of 234

Tue Feb 25 19:28:58 MST 2020

Done running all of the builds!

That posted to CDash:

Evidence that TPETRA_ASSUME_CUDA_AWARE_MPI is being set correctly now is these builds can be seen with the test TpetraCore_Behavior_Default_MPI_4 in the build Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt-exp here showing:

BEFORE: jsrun  '-p' '4' '--rs_per_socket' '4' '/vscratch1/rabartl/Trilinos.base/BUILDS/VORTEX/CTEST_S/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'
AFTER: export TPETRA_ASSUME_CUDA_AWARE_MPI=0; jsrun  '-p' '4' '--rs_per_socket' '4' '/vscratch1/rabartl/Trilinos.base/BUILDS/VORTEX/CTEST_S/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'

and in the build Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt_cuda-aware-mpi here showing:

BEFORE: jsrun  '-p' '4' '--rs_per_socket' '4' '/vscratch1/rabartl/Trilinos.base/BUILDS/VORTEX/CTEST_S/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'
AFTER: export TPETRA_ASSUME_CUDA_AWARE_MPI=1; jsrun  '-E LD_PRELOAD=/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-2019.06.24/lib/pami_451/libpami.so' '-M -gpu' '-p' '4' '--rs_per_socket' '4' '/vscratch1/rabartl/Trilinos.base/BUILDS/VORTEX/CTEST_S/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'

…trilinos#6902)

Turns out that the tests in the "cuda-aware-mpi" builds on 'vortex' were not
actually setting TPETRA_ASSUME_CUDA_AWARE_MPI=1.  This commit fixes that and
it also documents for users how to run the test suite with and without
CUDA-aware MPI in Tpetra.
@bartlettroscoe bartlettroscoe added type: bug The primary issue is a bug in Trilinos code or tests type: enhancement Issue is an enhancement, not a bug client: ATDM Any issue primarily impacting the ATDM project ATDM Config Issues that are specific to the ATDM configuration settings AT: AUTOMERGE Causes the PR autotester to automatically merge the PR branch once approvals are completed ATDM DevOps Issues that will be worked by the Coordinated ATDM DevOps teams labels Feb 26, 2020
@bartlettroscoe bartlettroscoe removed the AT: AUTOMERGE Causes the PR autotester to automatically merge the PR branch once approvals are completed label Feb 26, 2020
@bartlettroscoe
Copy link
Member Author

FYI: I manually merged this commit to 'atdm-nightly-manual-updates' in commit dc4093f so this will run tomorrow morning. But it would still be nice to get a review of the documentation before this gets merged (for which I will create new commits to fix any problems).

@jjellio
Copy link
Contributor

jjellio commented Feb 26, 2020

There is actually a typo in the jsrun_wrapper (line 75)

export TPETRA_ASSUME_CUDA_AWARE=0

It should have been setting TPETRA_ASSUME_CUDA_AWARE_MPI=0 if it wasn't set, but as you see on line 75, it is misspelled.

I am going to try adding the fix to this PR

@jjellio
Copy link
Contributor

jjellio commented Feb 26, 2020

I see the correct output from the jsrun_wrapper, e.g., a warning that the ENV isn't set, then on the AFTER line you can see it is set to 0

testing with cuda-aware unset!, should not see -gpu, look for -disable_gpu_hooks
WARNING, you have not set TPETRA_ASSUME_CUDA_AWARE_MPI=0 or 1, defaulting to TPETRA_ASSUME_CUDA_AWARE_MPI=0
BEFORE: jsrun  '-M' '-disable_gdr' '-p' '2' 'hostname'
AFTER: export TPETRA_ASSUME_CUDA_AWARE_MPI=0; jsrun  '-M' '-disable_gdr' '-p' '2' 'hostname'

@bartlettroscoe
Copy link
Member Author

There is actually a typo in the jsrun_wrapper

Yea, I noticed that when I was testing with TPETRA_ASSUME_CUDA_AWARE_MPI unset. Please push a new commit that fixes this to this PR.

@bartlettroscoe
Copy link
Member Author

I see the correct output from the jsrun_wrapper, e.g., a warning that the ENV isn't set, then on the AFTER line you can see it is set to 0

@jjellio, that is after you fixed it, right?

@jjellio
Copy link
Contributor

jjellio commented Feb 26, 2020

Yes, I pushed to this branch... I think, but it doesn't seem to be showing up in your branch.

edit:
I pulled your branch, but it pushed to my fork. Can I push to your fork? (I got a permission error when I tried)

e.g.,


remote: Resolving deltas: 100% (263/263), completed with 89 local objects.
remote: 
remote: Create a pull request for 'bartlettroscoe/cdofa-100-tril-6902-cuda-aware-mpi' on GitHub by visiting:
remote:      https://github.com/jjellio/Trilinos/pull/new/bartlettroscoe/cdofa-100-tril-6902-cuda-aware-mpi
remote: 

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 5871
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 5696
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL

  • Build Num: 4125
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0

  • Build Num: 3973
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 166
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 3504
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
JENKINS_JOB_TYPE Experimental
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_python_2

  • Build Num: 1752
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 1753
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Using Repos:

Repo: TRILINOS (bartlettroscoe/Trilinos)
  • Branch: cdofa-100-tril-6902-cuda-aware-mpi
  • SHA: 728d673
  • Mode: TEST_REPO

Pull Request Author: bartlettroscoe

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 5871
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 5696
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL

  • Build Num: 4125
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0

  • Build Num: 3973
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 166
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 3504
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
JENKINS_JOB_TYPE Experimental
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_python_2

  • Build Num: 1752
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 1753
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 728d673
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b


CDash Test Results for PR# 6904.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO REVIEWS HAVE BEEN PERFORMED ON THIS PULL REQUEST!

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@jjellio
Copy link
Contributor

jjellio commented Feb 26, 2020

Yea, I don't know how to push a change to your PR.

If you can spell out how to do it, I can push the typo correction. It would seem I need to clone your entire fork

@e10harvey
Copy link
Contributor

Yea, I don't know how to push a change to your PR.

If you can spell out how to do it, I can push the typo correction. It would seem I need to clone your entire fork

@jjellio: assuming Ross gives you permissions to push to his for you should be able to do:

git remote add ross [email protected]:bartlettroscoe/Trilinos.git
git fetch ross
git checkout cdofa-100-tril-6902-cuda-aware-mpi
git cherry-pick YOUR-COMMIT-SHA
git push ross cdofa-100-tril-6902-cuda-aware-mpi

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Feb 26, 2020

assuming Ross gives you permissions to push

@jjellio, I created this PR and checked the "Allow maintainers to edit branch" option. Therefore, if are a member of the Trilinos GitHub organization, you should be able to push to this branch.

An alternative set of git commands are:

git remote add bartlettroscoe [email protected]:bartlettroscoe/Trilinos.git
git fetch bartlettroscoe
git checkout --track bartlettroscoe/cdofa-100-tril-6902-cuda-aware-mpi
<edit any file you want>
git commit -a
git push   # Will push to tracking bartlettroscoe/cdofa-100-tril-6902-cuda-aware-mpi

e10harvey
e10harvey previously approved these changes Feb 26, 2020
Copy link
Contributor

@e10harvey e10harvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Ross! :-)

There is a small typo in the README.

$ ctest -j4
```

By set the mode for CUDA-aware MPI, set the env var:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"To set the mode for..."

$ ctest -j4
```

before running `ctest`. Otherwise, if `TPETRA_ASSUME_CUDA_AWARE_MPI` is not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we hyperlink to the documentation for TPETRA_ASSUME_CUDA_AWARE_MPI here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e10harvey, I don't think there is documentation for TPETRA_ASSUME_CUDA_AWARE_MPI. I think our README file is it!

@@ -186,8 +186,6 @@ if [[ "$ATDM_CONFIG_COMPILER" == "CUDA-10.1.243_"* ]]; then
export KOKKOS_NUM_DEVICES=4

# CTEST Settings
# TPETRA_ASSUME_CUDA_AWARE_MPI is used by cmake/std/atdm/ats2/trilinos_jsrun
export TPETRA_ASSUME_CUDA_AWARE_MPI=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't local-driver.sh run after environment.sh? Why is this export of TPETRA_ASSUME_CUDA_AWARE_MPI not picked up in local-driver?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e10harvey, the ctets-s-driver.sh script sources the load-env.sh script again and overwrites this. We need to just not touch the TPETRA_ASSUME_CUDA_AWARE_MPI var in the atdm/environment.sh script.

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@jjellio
Copy link
Contributor

jjellio commented Feb 26, 2020

It could help to look at what lalloc 1 does w.r.t. waiting for JSM to become ready. (or perhaps use their script)

E.g.,
LALLOC_DIR/bin/tweaked_jsm_wait_for_ready

…os_jsrun (CDOFA-100)

This ensures that the trilinos_jsrun script's logic for setting the default
for TPETRA_ASSUME_CUDA_AWARE_MPI=0 if TPETRA_ASSUME_CUDA_AWARE_MPI is unset
actually works.  That is important for developers who just want to manually
run the test suite.
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 5875
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 5700
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL

  • Build Num: 4129
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0

  • Build Num: 3977
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 169
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 3508
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
JENKINS_JOB_TYPE Experimental
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_python_2

  • Build Num: 1755
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 1756
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Using Repos:

Repo: TRILINOS (bartlettroscoe/Trilinos)
  • Branch: cdofa-100-tril-6902-cuda-aware-mpi
  • SHA: 0228511
  • Mode: TEST_REPO

Pull Request Author: bartlettroscoe

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 5875
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 5700
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL

  • Build Num: 4129
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0

  • Build Num: 3977
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 169
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 3508
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
JENKINS_JOB_TYPE Experimental
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_python_2

  • Build Num: 1755
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 1756
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 0228511
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 2387d8b


CDash Test Results for PR# 6904.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS NOT BEEN REVIEWED YET!

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@bartlettroscoe bartlettroscoe force-pushed the cdofa-100-tril-6902-cuda-aware-mpi branch from 0228511 to 85614cf Compare February 26, 2020 15:42
@bartlettroscoe
Copy link
Member Author

@e10harvey, @jjellio, and/or @kddevin, can you please review the updated documentation in the atdm/READM.md file? I made a few changes to hopefully make it more clear what is going on and I now mention the existence of trilinos_jsrun.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@bartlettroscoe
Copy link
Member Author

FYI: In commit 506fac1 I made the non-CUDA-aware MPI build start with unset TPETRA_ASSUME_CUDA_AWARE_MPI. I tested this on 'vortex' with:

$  env CTEST_DO_CONFIGURE=OFF CTEST_DO_BUILD=OFF Trilinos_PACKAGES=Tpetra ./ctest-s-local-test-driver.sh ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt

...

Running builds:
    ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt

Wed Feb 26 07:48:08 MST 2020

Running Jenkins driver Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt.sh ...

    See log file Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt/smart-jenkins-driver.out

real    16m12.920s
user    0m2.078s
sys     0m0.560s

100% tests passed, 0 tests failed out of 234
100% tests passed, 0 tests failed out of 234

Wed Feb 26 08:04:21 MST 2020

Done running all of the builds!

which posted to:

We can see the tests in the Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt-exp build starting with uninitialized TPETRA_ASSUME_CUDA_AWARE_MPI by looking at the test TpetraCore_Behavior_Default_MPI_4 here which shows:

WARNING, you have not set TPETRA_ASSUME_CUDA_AWARE_MPI=0 or 1, defaulting to TPETRA_ASSUME_CUDA_AWARE_MPI=0
BEFORE: jsrun  '-p' '4' '--rs_per_socket' '4' '/vscratch1/rabartl/Trilinos.base/BUILDS/VORTEX/CTEST_S/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'
AFTER: export TPETRA_ASSUME_CUDA_AWARE_MPI=0; jsrun  '-p' '4' '--rs_per_socket' '4' '/vscratch1/rabartl/Trilinos.base/BUILDS/VORTEX/CTEST_S/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'

Now that looks correct.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 5876
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 5701
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL

  • Build Num: 4130
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0

  • Build Num: 3978
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 170
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 3509
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
JENKINS_JOB_TYPE Experimental
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_python_2

  • Build Num: 1756
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 1757
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Using Repos:

Repo: TRILINOS (bartlettroscoe/Trilinos)
  • Branch: cdofa-100-tril-6902-cuda-aware-mpi
  • SHA: 85614cf
  • Mode: TEST_REPO

Pull Request Author: bartlettroscoe

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 5876
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 5701
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL

  • Build Num: 4130
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0

  • Build Num: 3978
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 170
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 3509
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
JENKINS_JOB_TYPE Experimental
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_python_2

  • Build Num: 1756
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 1757
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 6904
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH cdofa-100-tril-6902-cuda-aware-mpi
TRILINOS_SOURCE_REPO https://github.com/bartlettroscoe/Trilinos
TRILINOS_SOURCE_SHA 85614cf
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a7bc9e4


CDash Test Results for PR# 6904.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS BEEN REVIEWED, BUT NOT ACCEPTED OR REQUIRES CHANGES

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

Copy link
Contributor

@e10harvey e10harvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks, Ross!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ e10harvey ]!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR...

@bartlettroscoe bartlettroscoe added the AT: AUTOMERGE Causes the PR autotester to automatically merge the PR branch once approvals are completed label Feb 26, 2020
@bartlettroscoe
Copy link
Member Author

Okay, I will let this merge now ...

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged

@trilinos-autotester trilinos-autotester merged commit c4329f1 into trilinos:develop Feb 26, 2020
@trilinos-autotester
Copy link
Contributor

Merge on Pull Request# 6904: IS A SUCCESS - Pull Request successfully merged

@trilinos-autotester trilinos-autotester removed the AT: AUTOMERGE Causes the PR autotester to automatically merge the PR branch once approvals are completed label Feb 26, 2020
```bash
$ ctest -j4
```

The MPI test exectuables are run by a wrapper script `trilinos_jsrun` which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to know how one would check which value of TPETRA_ASSUME_CUDA_AWARE_MPI was used in a particular test configuration. If one wants to reproduce a failing test, where should one look in CDash to get the value used for that test? The environment variable setting is not archived in the CMake configuration output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to know how one would check which value of TPETRA_ASSUME_CUDA_AWARE_MPI was used in a particular test configuration.

It is printed out by trilinos_jsrun before it runs jsrun. Therefore, that information is on CDash in the detailed test output. For example, if you if you look at the output for the test TpetraCore_Behavior_Default_MPI_4 here you will see:

BEFORE: jsrun  '-p' '4' '--rs_per_socket' '4' '/vscratch1/jenkins/vortex-slave/workspace/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'
AFTER: export TPETRA_ASSUME_CUDA_AWARE_MPI=0; jsrun  '-p' '4' '--rs_per_socket' '4' '/vscratch1/jenkins/vortex-slave/workspace/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'

If you compare that to the CUDA-aware running of that same test here you see:

BEFORE: jsrun  '-p' '4' '--rs_per_socket' '4' '/vscratch1/jenkins/vortex-slave/workspace/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'
AFTER: export TPETRA_ASSUME_CUDA_AWARE_MPI=1; jsrun  '-E LD_PRELOAD=/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-2019.06.24/lib/pami_451/libpami.so' '-M -gpu' '-p' '4' '--rs_per_socket' '4' '/vscratch1/jenkins/vortex-slave/workspace/Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg/SRC_AND_BUILD/BUILD/packages/tpetra/core/test/Behavior/TpetraCore_Behavior_Default.exe'

Hopefully that is clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right; that information should be in the documentation. Trilinos developers are not accustomed to looking for that information, and it is specialized to the ATS-2 builds.

@jjellio
Copy link
Contributor

jjellio commented Feb 27, 2020

It could make sense to document that the different machines may report 'cuda-awareness' differently based on the node you are on.... UGH!
See,
#5179 (comment)

At the above point in time, if you configured with the rolling release module on login, you could get 'not-cuda-aware', and if you configured with a specific dated version OR on a compute, they would report 'cuda-aware'.

This motivated (in part) why testing is orchestrated the way it is. Now, with ATS2 testing, we are ensuring that regardless of what CMake discovered we will attempt to exercise both code paths.

I'm still not sure that downstream users will want to use trilinos_jsrun, but I suppose we could help them use ATS2 easier by adding 'cuda-aware' to the load_env.sh stuff. E.g., load_env.sh gnu-volta100-release-debug-cuda-aware, and this would set Tpetra's cuda aware variable.

I point out the prior paragraph because the wrapper depends on the ENV, and I don't think many users are aware that Tpetra's cuda-awareness is a runtime option. (they are accustom to CMake setting a default and just using whatever)

The danger if using the load_env approach, is that if they use jsrun directly, they would need to ensure that -M -gpu is passed.

It may also be worth documenting that the wrapper does something special for NP=1. Specifically, it disables Spectrum's Cuda hooks, because single process executables may not call MPI_Init

A goal in how I wrote the wrapper was that it always shows the user a single command line they could copy/paste. (BEFORE/AFTER) - It would make sense to drive that home - If a developer wants to replicate the wrapper, they can copy/paste one of those lines and they should get similar behavior.****

**** CAVEAT: With the LD_PRELOAD stuff added still in there, it is possible they could try to reproduce running a unit test that allocates before MPI_Init, and it would crash with missing PAMI symbols. Where as the wrapper is currently preloading the PAMI shared library, which means they will see different behavior... (so we need to nuke that LD_PRELOAD stuff if developers want to use this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATDM Config Issues that are specific to the ATDM configuration settings ATDM DevOps Issues that will be worked by the Coordinated ATDM DevOps teams client: ATDM Any issue primarily impacting the ATDM project type: bug The primary issue is a bug in Trilinos code or tests type: enhancement Issue is an enhancement, not a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants