-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random failures due to jumbled output in TpetraCore_Bug7745_MPI_4 and TpetraCore_MultiVector_LocalViewTests_MPI_4 starting 2022-08-05? #10885
Comments
FYI: Note that that single randomly failing test TpetraCore_MultiVector_LocalViewTests_MPI_4 in the last PR build: for PR #10802 shown here was the only failure in all of those PR builds that iteration as shown in this query. So one impediment to getting PRs for Tpetra merged are Tpetra's own randomly failing tests. |
TpetraCore_MultiVector_LocalViewTests_MPI_4 is supposed to throw exceptions. That's the point. It's testing error cases. As far as I can tell from the output, the test is running correctly. The problem is, as you noted, an output ordering munge. Why CUDA 11 suddenly makes it cry on the output is somewhat beyond me. But there you go. The CUDA 11 builds are wonky as all get out. Turning the output off should be an easy fix if I can actually reproduce the issue. I just wasted the whole afternoon trying to reproduce the build only to have the reproducer refuse to build the Tpetra tests. The documentation for PR reproduction still hasn't been updated, so I suspect I'm doing it wrong again. I haven't looked at the other one yet. |
And on the other machine the reproducer won't even configure correctly. Time to file another TRILINOSHD ticket... |
@csiefer2 I had independently noticed this yesterday and took a brief look -- at least in the case of MultiVector_LocalViewTests, the output that is overlapping with "End Result: TEST PASSED" is printed directly from catch blocks (i.e. direct cout statements, not the exception message itself). Historically, we've had a number of Tpetra tests that print output for informational purposes only, even when the test is passing. I thought we had fixed a lot of that because it has broken the PR tester before (and is of limited utility since you mostly want output when the test is failing). I'm pretty sure if we delete any informational output statements from the passing branch of the code, everything will work fine. |
@csiefer2 @bartlettroscoe I put up #10888 which removes the informational output and should resolve this issue. |
I also confirmed that the text of the informational output in question matches the statements reported by @bartlettroscoe to be overlapping with the "TEST PASSED" output. |
But that test does not seem to check that those lines are even printed. It is only checking for |
It doesn't have to check the output to work. The output is for human debugging purposes only. That's why @tasmith4 turned it off in #10888. We do actually know what we're doing here :) |
@bartlettroscoe #10888 has merged, removing the output reported in this issue, so this should now be resolved. Please reopen if you're still seeing problems with Tpetra test output. |
CC: @trilinos/tpetra
Description
As shown in this query (click "Shown Matching Output" in upper right) the tests:
TpetraCore_Bug7745_MPI_4
TpetraCore_MultiVector_LocalViewTests_MPI_4
appear to be randomly failing in the builds:
PR-10801-test-rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables-331
PR-10802-test-rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables-423
PR-10808-test-rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables-325
PR-10808-test-rhel7_sems-gnu-8.3.0-openmpi-1.10.1-openmp_release-debug_static_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables-794
PR-10834-test-rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables-409
starting testing day 2022-08-05.
The
TpetraCore_Bug7745_MPI_4
test failues appear to be output jumbling the lineEnd Result: TEST PASSED
such as here showing:The test
TpetraCore_MultiVector_LocalViewTests_MPI_4
fails are also technically due to jumbling the lineEnd Result: TEST PASSED
line such as here showing:In the above case, it seems that even though all of the unit tests claimed to pass, the fact that so many exceptions are being thrown, that might indicate that this test is actually defective and it is just dumb chance that these exception messages are jumbling the line
End Result: TEST PASSED
Current Status on CDash
Run the above query adjusting the "Begin" and "End" dates to match today any other date range or just click "CURRENT" in the top bar to see results for the current testing day.
Steps to Reproduce
Follow instructions at:
However, because these are random failures, triggering the failing test may be hard to do.
The text was updated successfully, but these errors were encountered: