-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: unreliable test condition in ScopeGuard tests #3453
Labels
Comments
We should fix how we're building on White with CUDA -- I think the warning is actually helpful in this case. |
Note that #4217 disabled these tests for CUDA builds; they will have to be re-enabled when this issue is fixed.
|
kddevin
added a commit
that referenced
this issue
Oct 26, 2020
In tests, removing check for no output from Kokkos::initialize Having a check of stderr from a TPL is bad form; we don't control what Kokkos chooses to output. The remainder of the tests (various combinations of initialized and finalized for Kokkos and MPI) is fine and remains.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@trilinos/tpetra @trilinos/kokkos
Current Behavior
The following tests fail on white for a CUDA build:
31 - TpetraCore_Core_initialize_where_user_initializes_mpi_MPI_4 (Failed)
32 - TpetraCore_Core_ScopeGuard_where_user_initializes_mpi_MPI_4 (Failed)
35 - TpetraCore_Core_initialize_where_tpetra_initializes_kokkos_MPI_1 (Failed)
36 - TpetraCore_Core_ScopeGuard_where_tpetra_initializes_kokkos_MPI_1 (Failed)
37 - TpetraCore_Core_initialize_where_user_initializes_kokkos_MPI_1 (Failed)
38 - TpetraCore_Core_ScopeGuard_where_user_initializes_kokkos_MPI_1 (Failed)
39 - TpetraCore_Core_initialize_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2 (Failed)
40 - TpetraCore_Core_ScopeGuard_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2 (Failed)
These tests rely on Kokkos not writing to std::cerr during Kokkos::initialize. However, for reasons unrelated to proper/improper use of Kokkos::initialize, Kokkos may write to std::cerr. E.g.,
"Captured output: Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability 3.5 on device with compute capability 3.7 , this will likely reduce potential performance."
In this case, all the initialization took place correctly (so the test should have passed), but Kokkos issued a warning to std::cerr (so the test failed).
Since Tpetra cannot control what Kokkos writes to std::cerr, this condition is not a reliable way to determine whether these tests pass or fail.
A side note: The test goes on to say "Captured output is empty!" when they should say "Captured output is NOT empty!" The incorrect message is confusing, but secondary to the unreliable test condition.
Steps to Reproduce
On white:
module purge
module load openmpi/2.1.2/gcc/7.2.0/cuda/9.2.88
module load cmake/3.9.6
module load openblas/0.2.20/gcc/7.2.0
module load boost/1.65.1/gcc/7.2.0
module load cuda/9.2.88
module load netcdf-exo/4.4.1.1/openmpi/2.1.2/gcc/7.2.0/cuda/9.0.176
export NVCC_WRAPPER_DEFAULT_COMPILER=
which g++
echo ${NVCC_WRAPPER_DEFAULT_COMPILER}
TRILINOS_SRC="/home/Trilinos"
export OMPI_CXX=${TRILINOS_SRC}/packages/kokkos/bin/nvcc_wrapper
which mpic++
mpic++ --version
export PATH=${PATH}:${TRILINOS_SRC}/packages/kokkos/bin
cmake
-DTPL_ENABLE_MPI=ON
-DMPI_BASE_DIR=${MPI_ROOT}
-DBLAS_LIBRARY_DIRS=${OPENBLAS_ROOT}/lib
-DLAPACK_LIBRARY_DIRS=${OPENBLAS_ROOT}/lib
-DNetcdf_LIBRARY_DIRS=${NETCDF_ROOT}/lib
-DBoostLib_LIBRARY_DIRS=${BOOST_ROOT}/lib
-DTPL_ENABLE_Matio=OFF
-DTrilinos_ENABLE_ALL_PACKAGES=OFF
-DTrilinos_ENABLE_Tpetra=ON
-DTpetra_ENABLE_TESTS=ON
-DTpetra_ENABLE_EXAMPLES=ON
-DCMAKE_INSTALL_PREFIX=${TRILINOS_SRC}/tmp
-D Trilinos_ENABLE_CUDA=ON
-D TPL_ENABLE_CUDA=ON
-D Tpetra_INST_CUDA:BOOL=ON
-DCMAKE_CXX_FLAGS="--expt-extended-lambda"
-DKokkos_ENABLE_Cuda_UVM:BOOL=ON
$TRILINOS_SRC
make -j 8
ctest -j4
Related Issues
#3095
Additional Information
This issue is low priority and will likely not be fixed unless it becomes a blocker for other developers.
The text was updated successfully, but these errors were encountered: