Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: unreliable test condition in ScopeGuard tests #3453

Closed
kddevin opened this issue Sep 17, 2018 · 2 comments
Closed

Tpetra: unreliable test condition in ScopeGuard tests #3453

kddevin opened this issue Sep 17, 2018 · 2 comments

Comments

@kddevin
Copy link
Contributor

kddevin commented Sep 17, 2018

@trilinos/tpetra @trilinos/kokkos

Current Behavior

The following tests fail on white for a CUDA build:
31 - TpetraCore_Core_initialize_where_user_initializes_mpi_MPI_4 (Failed)
32 - TpetraCore_Core_ScopeGuard_where_user_initializes_mpi_MPI_4 (Failed)
35 - TpetraCore_Core_initialize_where_tpetra_initializes_kokkos_MPI_1 (Failed)
36 - TpetraCore_Core_ScopeGuard_where_tpetra_initializes_kokkos_MPI_1 (Failed)
37 - TpetraCore_Core_initialize_where_user_initializes_kokkos_MPI_1 (Failed)
38 - TpetraCore_Core_ScopeGuard_where_user_initializes_kokkos_MPI_1 (Failed)
39 - TpetraCore_Core_initialize_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2 (Failed)
40 - TpetraCore_Core_ScopeGuard_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2 (Failed)

These tests rely on Kokkos not writing to std::cerr during Kokkos::initialize. However, for reasons unrelated to proper/improper use of Kokkos::initialize, Kokkos may write to std::cerr. E.g.,

"Captured output: Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability 3.5 on device with compute capability 3.7 , this will likely reduce potential performance."

In this case, all the initialization took place correctly (so the test should have passed), but Kokkos issued a warning to std::cerr (so the test failed).

Since Tpetra cannot control what Kokkos writes to std::cerr, this condition is not a reliable way to determine whether these tests pass or fail.

A side note: The test goes on to say "Captured output is empty!" when they should say "Captured output is NOT empty!" The incorrect message is confusing, but secondary to the unreliable test condition.

Steps to Reproduce

On white:

module purge
module load openmpi/2.1.2/gcc/7.2.0/cuda/9.2.88
module load cmake/3.9.6
module load openblas/0.2.20/gcc/7.2.0
module load boost/1.65.1/gcc/7.2.0
module load cuda/9.2.88
module load netcdf-exo/4.4.1.1/openmpi/2.1.2/gcc/7.2.0/cuda/9.0.176
export NVCC_WRAPPER_DEFAULT_COMPILER=which g++
echo ${NVCC_WRAPPER_DEFAULT_COMPILER}

TRILINOS_SRC="/home/Trilinos"
export OMPI_CXX=${TRILINOS_SRC}/packages/kokkos/bin/nvcc_wrapper
which mpic++
mpic++ --version

export PATH=${PATH}:${TRILINOS_SRC}/packages/kokkos/bin

cmake
-DTPL_ENABLE_MPI=ON
-DMPI_BASE_DIR=${MPI_ROOT}
-DBLAS_LIBRARY_DIRS=${OPENBLAS_ROOT}/lib
-DLAPACK_LIBRARY_DIRS=${OPENBLAS_ROOT}/lib
-DNetcdf_LIBRARY_DIRS=${NETCDF_ROOT}/lib
-DBoostLib_LIBRARY_DIRS=${BOOST_ROOT}/lib
-DTPL_ENABLE_Matio=OFF
-DTrilinos_ENABLE_ALL_PACKAGES=OFF
-DTrilinos_ENABLE_Tpetra=ON
-DTpetra_ENABLE_TESTS=ON
-DTpetra_ENABLE_EXAMPLES=ON
-DCMAKE_INSTALL_PREFIX=${TRILINOS_SRC}/tmp
-D Trilinos_ENABLE_CUDA=ON
-D TPL_ENABLE_CUDA=ON
-D Tpetra_INST_CUDA:BOOL=ON
-DCMAKE_CXX_FLAGS="--expt-extended-lambda"
-DKokkos_ENABLE_Cuda_UVM:BOOL=ON
$TRILINOS_SRC

make -j 8
ctest -j4

Related Issues

#3095

Additional Information

This issue is low priority and will likely not be fixed unless it becomes a blocker for other developers.

@mhoemmen
Copy link
Contributor

We should fix how we're building on White with CUDA -- I think the warning is actually helpful in this case.

@kddevin
Copy link
Contributor Author

kddevin commented Jan 29, 2019

Note that #4217 disabled these tests for CUDA builds; they will have to be re-enabled when this issue is fixed.
In cmake/std/PullRequestLinuxCuda9.2TestingSettings.cmake,

set (TpetraCore_Core_initialize_where_tpetra_initializes_kokkos_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing")
set (TpetraCore_Core_initialize_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing")
set (TpetraCore_Core_initialize_where_user_initializes_kokkos_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing")
set (TpetraCore_Core_initialize_where_user_initializes_mpi_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing")
set (TpetraCore_Core_ScopeGuard_where_tpetra_initializes_kokkos_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing")
set (TpetraCore_Core_ScopeGuard_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing")
set (TpetraCore_Core_ScopeGuard_where_user_initializes_kokkos_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing")
set (TpetraCore_Core_ScopeGuard_where_user_initializes_mpi_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing")

kddevin added a commit that referenced this issue Oct 26, 2020
In tests, removing check for no output from Kokkos::initialize
Having a check of stderr from a TPL is bad form; we don't
control what Kokkos chooses to output.
The remainder of the tests (various combinations of initialized
and finalized for Kokkos and MPI) is fine and remains.
@kddevin kddevin closed this as completed Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants