-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: Fix initialization tests to properly capture errors from Kokkos::initialize #7966
Comments
These are truly awful tests -- checking for output from a snapshotted TPL -- ugh. |
@kddevin To clarify I wasn't seeing extra failures because of this issue. I was seeing passing when the tests should fail. I think all of the initialization tests should currently fail with launch blocking off but currently only some of them do. |
@MicheldeMessieres I will look at these tests so that we can move ahead with a CUDA_LAUNCH_BLOCKING=0 nightly test. Do all the tests in packages/tpetra/core/test/Core pass when they should fail? Or just a subset of them? |
@kddevin For the latest develop all should pass now on Pascal with CUDA_LAUNCH_BLOCKING=1.
I know some were passing that should fail due to std::cerr being empty on just some ranks. I didn't get to the bottom of why these were behaving differently. |
In tests, removing check for no output from Kokkos::initialize Having a check of stderr from a TPL is bad form; we don't control what Kokkos chooses to output. The remainder of the tests (various combinations of initialized and finalized for Kokkos and MPI) is fine and remains.
Enhancement
@kddevin @csiefer2 This perhaps can be considered more of a bug since tests are not working properly, but it only impacts the tests themselves. The Tpetra initialization tests in packages/tpetra/core/test/Core capture std::cerr warning statements from Kokkos::initialize and are supposed to fail if the output is not empty. However for multiple ranks, the capture may be empty for some ranks and then the test will output text as FAIL PASS PASS PASS and the PASS gets picked up as sufficient to pass the test. The output should be FAIL FAIL FAIL FAIL as all ranks are writing the error to std::cerr.
I'm not sure we want to check just the root rank since the ranks may not all have the same error. It's not clear to me yet why it doesn't work. I have tried moving the capture so it always happens after MPI_Init is setup but that didn't seem to be a factor. I can investigate this further.
@trilinos/tpetra
The text was updated successfully, but these errors were encountered: