Tpetra: Fix initialization tests to properly capture errors from Kokkos::initialize #7966

MicheldeMessieres · 2020-09-02T17:58:27Z

Enhancement

@kddevin @csiefer2 This perhaps can be considered more of a bug since tests are not working properly, but it only impacts the tests themselves. The Tpetra initialization tests in packages/tpetra/core/test/Core capture std::cerr warning statements from Kokkos::initialize and are supposed to fail if the output is not empty. However for multiple ranks, the capture may be empty for some ranks and then the test will output text as FAIL PASS PASS PASS and the PASS gets picked up as sufficient to pass the test. The output should be FAIL FAIL FAIL FAIL as all ranks are writing the error to std::cerr.

I'm not sure we want to check just the root rank since the ranks may not all have the same error. It's not clear to me yet why it doesn't work. I have tried moving the capture so it always happens after MPI_Init is setup but that didn't seem to be a factor. I can investigate this further.

@trilinos/tpetra

kddevin · 2020-09-02T21:15:53Z

These are truly awful tests -- checking for output from a snapshotted TPL -- ugh.
See #3453 and #6748.
It appears that in #6748, @ndellingwood disabled the Kokkos warnings for these tests so that the tests always pass. But you are seeing them fail, right? Weird.

MicheldeMessieres · 2020-09-02T22:31:20Z

@kddevin To clarify I wasn't seeing extra failures because of this issue. I was seeing passing when the tests should fail. I think all of the initialization tests should currently fail with launch blocking off but currently only some of them do.

kddevin · 2020-09-16T22:22:57Z

@MicheldeMessieres I will look at these tests so that we can move ahead with a CUDA_LAUNCH_BLOCKING=0 nightly test. Do all the tests in packages/tpetra/core/test/Core pass when they should fail? Or just a subset of them?
Thanks.

MicheldeMessieres · 2020-09-17T00:22:59Z

@kddevin For the latest develop all should pass now on Pascal with CUDA_LAUNCH_BLOCKING=1.
With CUDA_LAUNCH_BLOCKING=0 I have the following fails:

	 47 - TpetraCore_Core_initialize_where_user_initializes_mpi_MPI_4 (Failed)
	 48 - TpetraCore_Core_ScopeGuard_where_user_initializes_mpi_MPI_4 (Failed)

I know some were passing that should fail due to std::cerr being empty on just some ranks. I didn't get to the bottom of why these were behaving differently.

In tests, removing check for no output from Kokkos::initialize Having a check of stderr from a TPL is bad form; we don't control what Kokkos chooses to output. The remainder of the tests (various combinations of initialized and finalized for Kokkos and MPI) is fine and remains.

MicheldeMessieres added type: enhancement Issue is an enhancement, not a bug pkg: Tpetra labels Sep 2, 2020

MicheldeMessieres self-assigned this Sep 2, 2020

MicheldeMessieres mentioned this issue Sep 2, 2020

Tpetra: Pass initialization tests with CUDA_LAUNCH_BLOCKING off #7967

Closed

MicheldeMessieres removed their assignment Sep 24, 2020

kddevin self-assigned this Oct 26, 2020

kddevin mentioned this issue Oct 26, 2020

Tpetra: remove check of Kokkos warning output in Tpetra tests #8256

Merged

kddevin closed this as completed Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tpetra: Fix initialization tests to properly capture errors from Kokkos::initialize #7966

Tpetra: Fix initialization tests to properly capture errors from Kokkos::initialize #7966

MicheldeMessieres commented Sep 2, 2020

kddevin commented Sep 2, 2020

MicheldeMessieres commented Sep 2, 2020

kddevin commented Sep 16, 2020

MicheldeMessieres commented Sep 17, 2020

Tpetra: Fix initialization tests to properly capture errors from Kokkos::initialize #7966

Tpetra: Fix initialization tests to properly capture errors from Kokkos::initialize #7966

Comments

MicheldeMessieres commented Sep 2, 2020

Enhancement

kddevin commented Sep 2, 2020

MicheldeMessieres commented Sep 2, 2020

kddevin commented Sep 16, 2020

MicheldeMessieres commented Sep 17, 2020