-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to CUDA 9.2 build on white/ride (TRIL-215) #3260
Switch to CUDA 9.2 build on white/ride (TRIL-215) #3260
Conversation
… (TRIL-215) Here I updated the default 'cuda' build to be a 'cuda-9.2' build. I also removed the GPU arch from KOKKOS_ARCH when doing a GPU build. I don't think you want or need the GPU arch when not using CUDA. I think having it might just confuse people.
All of the builds on white/ride have been all-at-once builds for long time. Therefore, the build Trilinos-atdm-white-ride-cuda-debug-all-at-once is redundant and we can take name "all-at-once" off of the PT build.
Having the CUDA version explicitly in the same is a good idea from now on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bartlettroscoe ! :D
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
|
Shoot, the auto tester build
I am wondering if this might not be a random failure? There is no way this PR could break this test. Therefore, I will put in the I really wish that #3133 could get done so that this issues don't keep slowing down the ATDM Trilinos efforts. |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. Just had the one question on default module in comments.
if [ "$ATDM_CONFIG_COMPILER" == "GNU" ] ; then | ||
|
||
# Load the modules and set up env | ||
module load devpack/20180308/openmpi/2.1.2/gcc/7.2.0/cuda/9.0.176 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be cuda 9.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rppawlo, that is what is is currently. But given this is a GNU build and does not use CUDA, I suspect that it should not matter. This is still a GCC 7.2.0 build. But we should likely change it I guess since the CUDA 9.0 env does not really work with CUDA.
@trilinos/framework, The last auto PR build failed over 7 hours ago and I put the Can someone with the Trilinos github superpowers please merge this branch to 'develop' manually? At this point there is no way to get this merged to 'develop' before ATDM Trilinos builds kick off tomorrow. |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
I looked at this before and this instance is just really backed up today. Some of the Intel builds have taken ~7.5 hours for some reason. It looked like the last run was almost done (that intel build had only been running for a little under 3 hours at the time... again, not sure why). In any case, Let's keep an eye on this, and we can consider manual intervention if this run is unsuccessful. |
Thanks Jim. Fingers crossed |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
|
@trilinos/framework The third PR attempt failed, this time with timing out tests as shown at: My guess is that the Jenkins machine where this build was run is being overloaded. There is zero chance that the changes on this PR are causing these failures. |
Adding |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ mhoemmen rppawlo ]! |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
CC: @fryeguy52, @rppawlo, @mhoemmen
Description
This gets an ATDM Trilinos CUDA 9.2 build going on white/ride.
Motivation and Context
We lost the CUDA 8.0 build when 'white' and 'ride' got updated last month. See https://software-sandbox.sandia.gov/jira/browse/TRIL-215.
How Has This Been Tested?
I tested this on 'white' with:
which returned:
I also did:
which posted to:
These show that there are some test failures but we will submit these new CUDA 9.2 jobs to the "Specialized" CDash group and then set up new Trilinos GitHub issues for the new failures.
Checklist