Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: Unit test TpetraCore_BlockCrsMatrix failure with OpenMP+Serial build with sems-intel/19.0.5 compiler #11143

Closed
ndellingwood opened this issue Oct 13, 2022 · 10 comments
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests

Comments

@ndellingwood
Copy link
Contributor

Bug Report

@trilinos/tpetra

Description

The TpetraCore_BlockCrsMatrix test fails in OpenMP+Serial builds with the sems-intel/19.0.5 compiler on SNB architecture

Failure snip:

...
19: 9. BlockCrsMatrix_double_int_longlong_Kokkos_Compat_KokkosOpenMPWrapperNode_block2point_UnitTest ... Test conversion from (point) CrsMatrix to BlockCrsMatrix
19: Read CrsMatrix from file "blockA.mm"
19: Migrate input CrsMatrix to final parallel distribution
19: Convert CrsMatrix to BlockCrsMatrix
19: CrsMatrix::apply
19: Compute norm of result
19: NewPointMatrix::apply
19: ||CSR*xrand|| = 1.80333, ||CSR*xrand - BCSR*xrand|| / ||CSR*xrand|| = 0
19: [Passed] (0.00278 sec)
19: 10. BlockCrsMatrix_double_int_longlong_Kokkos_Compat_KokkosOpenMPWrapperNode_Transpose_UnitTest ... 
19:   Testing Transpose
19:    Read CrsMatrix from file "blockA.mm"
19:    Migrate input CrsMatrix to final parallel distribution
19:    Convert CrsMatrix to BlockCrsMatrix
19:    CrsMatrix::apply
19:    Compute norm of result
19:    NewBlockMatrix::apply
19:    ||CSR*xrand|| = 3.30675, ||CSR*xrand - BCSR*xrand|| / ||CSR*xrand|| = 0.208428
19:    
19:    Check: rel_err(normVec1[0], normVec2[0])
19:           = rel_err(3.30675, 3.14044) = 0.0502929
19:             <= tol = 1.49012e-08 : FAILED
19:  [FAILED]  (0.00331 sec) BlockCrsMatrix_double_int_longlong_Kokkos_Compat_KokkosOpenMPWrapperNode_Transpose_UnitTest
19:  Location: /ascldap/users/ndellin/trilinos/Trilinos/packages/tpetra/core/test/Block/BlockCrsMatrix.cpp:2755
...
19: 32. BlockCrsMatrix_std_complex0double0_int_longlong_Kokkos_Compat_KokkosOpenMPWrapperNode_Transpose_UnitTest ... 
19:   Testing Transpose
19:    Read CrsMatrix from file "blockA-complex.mm"
19:    Migrate input CrsMatrix to final parallel distribution
19:    Convert CrsMatrix to BlockCrsMatrix
19:    CrsMatrix::apply
19:    Compute norm of result
19:    NewBlockMatrix::apply
19:    ||CSR*xrand|| = 1.11294, ||CSR*xrand - BCSR*xrand|| / ||CSR*xrand|| = 0.443719
19:    
19:    Check: rel_err(normVec1[0], normVec2[0])
19:           = rel_err(1.11294, 0.977712) = 0.121501
19:             <= tol = 1.49012e-08 : FAILED
19:  [FAILED]  (0.00269 sec) BlockCrsMatrix_std_complex0double0_int_longlong_Kokkos_Compat_KokkosOpenMPWrapperNode_Transpose_UnitTest
19:  Location: /ascldap/users/ndellin/trilinos/Trilinos/packages/tpetra/core/test/Block/BlockCrsMatrix.cpp:2755
...
19: The following tests FAILED:
19:     10. BlockCrsMatrix_double_int_longlong_Kokkos_Compat_KokkosOpenMPWrapperNode_Transpose_UnitTest ... 
19:     32. BlockCrsMatrix_std_complex0double0_int_longlong_Kokkos_Compat_KokkosOpenMPWrapperNode_Transpose_UnitTest ... 

Steps to Reproduce

  1. SHA1: a76c1c4
  2. Configure script:
source /projects/sems/modulefiles/utils/sems-modules-init.sh
module load sems-intel/19.0.5 sems-cmake/3.21.1 sems-openblas/0.3.10

export CXX="icpc"
export CC="icc"
export FC="ifort"
export F77="ifort"
export F90="ifort"

SHARED=ON
COMPLEX=ON

SERIAL=ON
OPENMP=ON
THREADS=OFF
CUDA=OFF

cmake \
-DCMAKE_INSTALL_PREFIX="${TRILINOS_INSTALL_DIR}" \
-DCMAKE_CXX_STANDARD="17" \
-D Kokkos_ARCH_SNB=ON \
-D CMAKE_CXX_FLAGS="-g" \
-D CMAKE_C_FLAGS="-g" \
-D CMAKE_Fortran_FLAGS="-g" \
\
-D Trilinos_ENABLE_INSTALL_CMAKE_CONFIG_FILES:BOOL=ON \
-D Trilinos_ENABLE_ALL_PACKAGES:BOOL=OFF \
-D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=OFF \
-D CMAKE_SKIP_RULE_DEPENDENCY=ON \
-D BUILD_SHARED_LIBS:BOOL=${SHARED} \
-D CMAKE_BUILD_TYPE:STRING=RELEASE \
-D Trilinos_ENABLE_DEBUG:BOOL=OFF \
\
-D Trilinos_ENABLE_COMPLEX_DOUBLE=${COMPLEX} \
-D Trilinos_ENABLE_EXPLICIT_INSTANTIATION:BOOL=ON \
-D Trilinos_ENABLE_OpenMP=${OPENMP} \
-D TPL_ENABLE_MPI=OFF \
-D TPL_ENABLE_BinUtils=OFF \
-D TPL_ENABLE_BLAS:STRING=ON \
  -D BLAS_LIBRARY_DIRS:FILEPATH="${OPENBLAS_ROOT}/lib" \
  -D BLAS_LIBRARY_NAMES:STRING="openblas" \
-D TPL_ENABLE_LAPACK:STRING=ON \
  -D LAPACK_INCLUDE_DIRS:FILEPATH="${OPENBLAS_ROOT}/include" \
  -D LAPACK_LIBRARY_DIRS:FILEPATH="${OPENBLAS_ROOT}/lib" \
  -D LAPACK_LIBRARY_NAMES:STRING="openblas" \
-D TPL_ENABLE_DLlib=ON \
-D TPL_ENABLE_Pthread=ON \
\
-D Trilinos_ENABLE_Kokkos=ON \
  -D Kokkos_ENABLE_SERIAL=${SERIAL} \
  -D Kokkos_ENABLE_OPENMP=${OPENMP} \
  -D Kokkos_ENABLE_THREADS=${THREADS} \
  -D Kokkos_ENABLE_CUDA=${CUDA} \
  -D Kokkos_ENABLE_CUDA_LAMBDA=${CUDA} \
-D Trilinos_ENABLE_KokkosKernels=ON \
-D Trilinos_ENABLE_Tpetra=ON \
  -D Tpetra_INST_SERIAL:BOOL=${SERIAL} \
  -D Tpetra_INST_OPENMP:BOOL=${OPENMP} \
  -D Tpetra_INST_PTHREAD:BOOL=${THREADS} \
  -D Tpetra_INST_CUDA:BOOL=${CUDA} \
  -D Tpetra_ENABLE_TESTS:BOOL=ON \
  -D Tpetra_ENABLE_EXAMPLES:BOOL=ON \
\
$TRILINOS_DIR
@ndellingwood ndellingwood added type: bug The primary issue is a bug in Trilinos code or tests pkg: Tpetra labels Oct 13, 2022
@csiefer2
Copy link
Member

@ndellingwood There have been a number of openblas related issues (mostly associated with the CUDA11 build), so I'm wondering if this is an Intel 19 problem or an OpenBLAS problem.

I'm going to try using the system BLAS first and see if I can reproduce.

@ndellingwood
Copy link
Contributor Author

There have been a number of openblas related issues (mostly associated with the CUDA11 build), so I'm wondering if this is an Intel 19 problem or an OpenBLAS problem.

@csiefer2 thanks for the update! And thanks sharing the info regarding openblas + cuda/11 issues, I wasn't aware but I'll monitor for this as well

@csiefer2 csiefer2 reopened this Oct 14, 2022
@csiefer2
Copy link
Member

@ndellingwood I can't reproduce the failure on my desktop with either the system BLAS or OpenBLAS. What machine did you run this on?

@csiefer2
Copy link
Member

OpenBLAS discussion can be found here #11109

@ndellingwood
Copy link
Contributor Author

@csiefer2 I encountered the error on kokkos-dev

@csiefer2
Copy link
Member

@ndellingwood I evidently have access to that. I'll try there.

@csiefer2
Copy link
Member

@ndellingwood I can reproduce the issue on kokkos-dev... but there's no system blas to compare with. Let me try building a reference blas on that system for comparison

@csiefer2
Copy link
Member

Yeah, I can't get the code to link against a reference blas/lapack. I keep getting errors like:

/../../../kokkos-kernels/src/libkokkoskernels.so.13.5: undefined reference to `for_write_seq_fmt'

@github-actions
Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Oct 15, 2023
Copy link

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Nov 15, 2023
@jhux2 jhux2 added this to Tpetra Aug 12, 2024
@jhux2 jhux2 moved this to Done in Tpetra Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests
Projects
Status: Done
Development

No branches or pull requests

2 participants