Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: Fix initialization tests to properly capture errors from Kokkos::initialize #7966

Closed
MicheldeMessieres opened this issue Sep 2, 2020 · 4 comments
Assignees
Labels
pkg: Tpetra type: enhancement Issue is an enhancement, not a bug

Comments

@MicheldeMessieres
Copy link
Contributor

Enhancement

@kddevin @csiefer2 This perhaps can be considered more of a bug since tests are not working properly, but it only impacts the tests themselves. The Tpetra initialization tests in packages/tpetra/core/test/Core capture std::cerr warning statements from Kokkos::initialize and are supposed to fail if the output is not empty. However for multiple ranks, the capture may be empty for some ranks and then the test will output text as FAIL PASS PASS PASS and the PASS gets picked up as sufficient to pass the test. The output should be FAIL FAIL FAIL FAIL as all ranks are writing the error to std::cerr.

I'm not sure we want to check just the root rank since the ranks may not all have the same error. It's not clear to me yet why it doesn't work. I have tried moving the capture so it always happens after MPI_Init is setup but that didn't seem to be a factor. I can investigate this further.

@trilinos/tpetra

@MicheldeMessieres MicheldeMessieres added type: enhancement Issue is an enhancement, not a bug pkg: Tpetra labels Sep 2, 2020
@MicheldeMessieres MicheldeMessieres self-assigned this Sep 2, 2020
@kddevin
Copy link
Contributor

kddevin commented Sep 2, 2020

These are truly awful tests -- checking for output from a snapshotted TPL -- ugh.
See #3453 and #6748.
It appears that in #6748, @ndellingwood disabled the Kokkos warnings for these tests so that the tests always pass. But you are seeing them fail, right? Weird.

@MicheldeMessieres
Copy link
Contributor Author

@kddevin To clarify I wasn't seeing extra failures because of this issue. I was seeing passing when the tests should fail. I think all of the initialization tests should currently fail with launch blocking off but currently only some of them do.

@kddevin
Copy link
Contributor

kddevin commented Sep 16, 2020

@MicheldeMessieres I will look at these tests so that we can move ahead with a CUDA_LAUNCH_BLOCKING=0 nightly test. Do all the tests in packages/tpetra/core/test/Core pass when they should fail? Or just a subset of them?
Thanks.

@MicheldeMessieres
Copy link
Contributor Author

@kddevin For the latest develop all should pass now on Pascal with CUDA_LAUNCH_BLOCKING=1.
With CUDA_LAUNCH_BLOCKING=0 I have the following fails:

	 47 - TpetraCore_Core_initialize_where_user_initializes_mpi_MPI_4 (Failed)
	 48 - TpetraCore_Core_ScopeGuard_where_user_initializes_mpi_MPI_4 (Failed)

I know some were passing that should fail due to std::cerr being empty on just some ranks. I didn't get to the bottom of why these were behaving differently.

@MicheldeMessieres MicheldeMessieres removed their assignment Sep 24, 2020
@kddevin kddevin self-assigned this Oct 26, 2020
kddevin added a commit that referenced this issue Oct 26, 2020
In tests, removing check for no output from Kokkos::initialize
Having a check of stderr from a TPL is bad form; we don't
control what Kokkos chooses to output.
The remainder of the tests (various combinations of initialized
and finalized for Kokkos and MPI) is fine and remains.
@kddevin kddevin closed this as completed Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Tpetra type: enhancement Issue is an enhancement, not a bug
Projects
None yet
Development

No branches or pull requests

2 participants