-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: Multiple test failures with intel/2021.4, intel/2023.2 (icpc) #11968
Comments
There's a ton of tests which hang with 2021.3 w/ OpenMP (which is what I have easy access to). |
@ndellingwood I was looking at Bug5800 and something is making Kokkos::parallel_scan() hang. Not sure how to proceed. |
@csiefer2 can you distill it down to a simple reproducer to submit as an issue to Kokkos? |
@csiefer2 here is the output I had for the Bug5800 test when I posted, it did not hang but the failure output may be dated:
|
@csiefer2 that was also on Blake before the DST and hardware + module overhaul |
I suspect that won't work. But I suppose I can try. |
@csiefer2 I tested a serial build on (new) Blake using icpc (intel classic compiler) and intel/oneapi/2023.2.0 and reproduced the
Trilinos SHA ff92fc9 module load cmake intel-oneapi-compilers/2023.2.0 intel-oneapi-mkl/2023.2.0
export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a"
export LAPACK_LIBRARIES=${BLAS_LIBRARIES}
cmake \
-D CMAKE_CXX_COMPILER="`which icpc`" \
-D CMAKE_C_COMPILER="`which icc`" \
-D CMAKE_Fortran_COMPILER="`which ifort`" \
-D CMAKE_CXX_FLAGS="-g -no-ip" \
-D CMAKE_C_FLAGS="-g -no-ip" \
-DTPL_ENABLE_MPI=OFF \
-DTPL_ENABLE_BLAS:BOOL=ON \
-DTPL_BLAS_LIBRARIES:PATH="${BLAS_LIBRARIES}" \
-DTPL_LAPACK_LIBRARIES:PATH="${LAPACK_LIBRARIES}" \
-DTPL_ENABLE_LAPACK:BOOL=ON \
-DTrilinos_ENABLE_ALL_PACKAGES=OFF \
-DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES=OFF \
-DTrilinos_ENABLE_TESTS=ON \
-DTrilinos_MUST_FIND_ALL_TPL_LIBS=TRUE \
-DTrilinos_ENABLE_OpenMP=OFF \
-DTrilinos_ENABLE_Kokkos=ON \
-D Kokkos_ENABLE_SERIAL=ON \
-D Kokkos_ENABLE_TESTS=ON \
-D Kokkos_ARCH_SKX=ON \
-DTrilinos_ENABLE_KokkosKernels=ON \
-D KokkosKernels_ENABLE_TESTS=ON \
-DTrilinos_ENABLE_Tpetra=ON \
-D Tpetra_ENABLE_TESTS=ON \
\
-DTPL_ENABLE_Matio=OFF \
\
$TRILINOS_DIR |
Well, there's always the hope that fixing the 2023.2 issue on blake will fix the 2021.3 hang :) I'll see what I can do |
@ndellingwood This really looks like a compiler bug or a UMR: The code:
The output:
I might add this dims[1] being wrong is what breaks the matrix reader. |
@ndellingwood Do you have a working valgrind on blake? |
Yuck, that's odd... I'm curious what is the type of I peaked on Blake, the current versions of valgrind available are only for gcc compilers (the intel-oneapi installs are pretty new, I think adding additional compatible modules is still WIP) |
I'm not going to be able to answer compiler bug vs. UMR without a memory debugger and I can't get valgrind to build correctly myself. |
Does Intel have address sanitizer support? |
@jhux2 I don't think so. |
It looks like OneAPI's icx is based on LLVM and might support asan. [edit] |
@jhux2 I hit these failures with the intel classic compiler (icpc) but have not tested with icpx. Let me try out a build, if they reproduce with icpx then the asan utils could be a good tool to explore |
@ndellingwood I should say that I'm not 100% sure whether icpx supports asan. Googling provided hints, but I didn't find any definitive documentation. |
The test valgrinds clean w/ gcc 10 on my desktop. |
@csiefer2 that's good to know, I've only seen these test failures occur with intel icpc compilers so that might add a nudge toward some compiler wonkiness at play? |
Maybe? I'll try with the SEMS 2021.3 and see if that fails in the same way and if I can valgrind that. |
Those tests pass with 2021.3 on my desktop (Serial backend). So I'm more seriously thinking compiler bug. @ndellingwood |
Just updating the issue, I'm seeing similar failures for Serial and OpenMP builds with the # Blake all queue - non-mpi build
# Environment
module load cmake intel-oneapi-compilers/2023.2.0 intel-oneapi-mkl/2023.2.0
module list
export TRILINOS_DIR=<path-to-source>
export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a"
export LAPACK_LIBRARIES=${BLAS_LIBRARIES}
# Configure Trilinos
cmake \
-D CMAKE_INSTALL_PREFIX="${PWD}/install" \
-D CMAKE_CXX_COMPILER="`which icpc`" \
-D CMAKE_C_COMPILER="`which icc`" \
-D CMAKE_Fortran_COMPILER="`which ifort`" \
-D CMAKE_CXX_FLAGS="-g -no-ip" \
-D CMAKE_C_FLAGS="-g -no-ip" \
-DTPL_ENABLE_MPI=OFF \
-DTPL_ENABLE_BLAS:BOOL=ON \
-DTPL_BLAS_LIBRARIES:PATH="${BLAS_LIBRARIES}" \
-DTPL_LAPACK_LIBRARIES:PATH="${LAPACK_LIBRARIES}" \
-DTPL_ENABLE_LAPACK:BOOL=ON \
-DTrilinos_ENABLE_ALL_PACKAGES=OFF \
-DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES=OFF \
-DTrilinos_ENABLE_TESTS=OFF \
-DTrilinos_MUST_FIND_ALL_TPL_LIBS=TRUE \
-DTrilinos_ENABLE_COMPLEX=ON \
-DTrilinos_ENABLE_OpenMP=OFF \
-DTrilinos_ENABLE_Kokkos=ON \
-D Kokkos_ENABLE_SERIAL=ON \
-D Kokkos_ARCH_SKX=ON \
-DTrilinos_ENABLE_KokkosKernels=ON \
-DTrilinos_ENABLE_Tpetra=ON \
-D Tpetra_ENABLE_TESTS=ON \
-DTrilinos_ENABLE_Ifpack2=ON \
-D Ifpack2_ENABLE_TESTS=ON \
\
-DTPL_ENABLE_Matio=OFF \
\
-DTrilinos_ENABLE_INSTALLATION_TESTING=OFF \
$TRILINOS_DIR |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
Bug Report
Testing builds on Blake (SKX arch) with intel/2021.4 (icpc, intel classic compiler) report multiple test failures
@trilinos/tpetra
Steps to Reproduce
The text was updated successfully, but these errors were encountered: