-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable some individual Kokkos and KokkosKernels tests on a few more full debug builds and disable the Belos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4 tests on a few more platforms #2964
Disable some individual Kokkos and KokkosKernels tests on a few more full debug builds and disable the Belos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4 tests on a few more platforms #2964
Conversation
…some debug builds (trilinos#2827) These very targeted disables should allow these tests to all complete in well under 10 minutes in all of these debug builds on all of these platforms. See the diffs to see exactly what unit tests are disabled in what unit test executables in what builds on what platforms. For details on why these are being disabled, see trilinos#2827.
…PI_4 in gnu-debug-openmp build on white/ride (trilinos#2920) This failed with maxing out at 100 iterations today in the Trilinos-atdm-white-ride-gnu-debug-openmp build where otherwise it converges at 87 iterations. Therefore, we are disabling this test like we did for the cuda-debug build. For more details see trilinos#2920.
…PI_4 in cuda-opt build on white/ride (trilinos#2920) This failed with maxing out at 100 iterations on 6/17/2018 the Trilinos-atdm-white-ride-cupda-top build where otherwise it converges at 87 iterations. Therefore, we are disabling this test like we did for some other builds on white/ride. For more details see trilinos#2920.
…n/shiller (TRIL-211) For some reason, it can't find Ninja when using 'srun' to configure and build on the compute node. But for some crazy reason, you can use 'srun' to configure, build, and run tests on the compute node when using the Jenkins driver script. Very strange.
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point we simply won't be running any more tests ;-)
@mhoemmen said:
Before we started this effort, zero Belos tests were run on any of these ATDM platforms. We would have to disable a lot of tests before we get anywhere near that. The alternative is to "Stop the Line" and fix these things as they come up (since the problems with this code has always been there so the initial job was not finished) but it is hard to justify that when there are more urgent things to do. From the ATDM perspective, I think that disabling many of these tests does not loose any testing really at all that is protecting ATDM APP customers. That is my focus right now. |
@bartlettroscoe wrote:
I'm just being snarky -- and it's definitely not about you. Belos historically has seen intermittent test failures. The tests (and at least one solver) have tended to use random numbers, and also are a bit sensitive to residual computations. |
@trilinos/framework, CDash is showing all three of the auto PR builds started over 6 hours ago shown above passing on CDash at: But for some reason, the auto PR tester has not added a comment to to this PR that testing completed successfully yet (therefore, allowing the merge). From looking at CDash, all of the builds should have completed in under 4 hours and therefore we should have gotten results posted to this PR about 2 hours ago (plus 20 minutes perhaps). Is there some issue with the auto PR tester? |
To give as shot at merging this PR today (so that it can run in the ATDM Trilinos builds tomorrow morning), I will put on the "AT: RETEST" label and hope it all works this time (and then I can merge at 10 PM ET hopefully). |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Resetting Testing Status |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
|
@trilinos/framework, The GCC 4.8.4 build in the PR testing failed twice late yesterday and early this morning. There is no way my changes in this PR can break the build since they only impact ATDM Trilinos builds. It looks like someone tried to switch over to use the setting for using OpenMP with which shows the command-line:
How did this change make it into the 'develop' branch? This could not have passed a PR build. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ mhoemmen ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - Master Automerge is disabled (in .cfg file) |
same as what was done in trilinos#2964
same as what was done in trilinos#2964
same as what was done in trilinos#2964
CC: @fryeguy52
Description
This PR branch contains commits to disable a few of the individual Kokkos and KokkosKernels unit tests for some full debug builds (see the careful analysis and identification of these individual unit tests in #2827 (comment)) (see #2827).
This PR branch also contains commits to disable the tests
Belos_pseudo_stochastic_pcg_hb_[0,1]_MPI_4
in a few other builds on white/ride where then have been seen to be randomly failing with hitting the 100 iteration max (see #2920 (comment) and #2920 (comment)).I also updated some documentation in the
cmake/std/atdm/README.md
file about using checkin-test-atdm.sh on hansen/shiller.Motivation and Context
These tests fairly regularly fail in automated ATDM testing. Going forward we can't tolerate known randomly failing tests. Such tests will destroy automated processes to update Trilinos for that ATDM APP codes.
How Has This Been Tested?
I ran the builds and tests manually on 'shiller' and 'ride' and verified that with these changes all of the Kokkos and KokkosKernels tests in the debug builds far under the 600 sec timeout (but one test was as high as 447 sec).
See details of the testing below.
DETAILED TEST RESULTS: (click to expand)
A) Testing on 'shiller'
returned
Using the script:
Looking for expensive tests with:
All of those look well under 600 sec.
**B) Testing on 'ride':
returned:
Looking for expensive tests with:
The most expensive test was 447 sec but that is pretty far below 600 sec so hopefully we will be okay.
Checklist