Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Albany configure errors due to TriBITS update on 11/25/2021 #9972

Closed
ikalash opened this issue Nov 26, 2021 · 37 comments
Closed

Albany configure errors due to TriBITS update on 11/25/2021 #9972

ikalash opened this issue Nov 26, 2021 · 37 comments
Labels
client: Albany Issue impacting the Albany project pkg: STK type: bug The primary issue is a bug in Trilinos code or tests

Comments

@ikalash
Copy link
Contributor

ikalash commented Nov 26, 2021

We're getting failures in our nightly tests for Albany due to missing targets in STK:

Configure Command: "/projects/cde/qual/spack/opt/spack/linux-rhel7-x86_64/gcc-7.2.0/cmake-3.19.2-ygawmgvyrwfhelyp3r6knajkdli3rxo4/bin/cmake"  "-DALBANY_CTEST_TIMEOUT:INTEGER=60"  "-DALBANY_TRILINOS_DIR:FILEPATH=/ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release"  "-DCMAKE_CXX_FLAGS:STRING=-DNDEBUG" "-DCMAKE_VERBOSE_MAKEFILE:BOOL=OFF"  "-DENABLE_LANDICE:BOOL=ON" "-DENABLE_UNIT_TESTS:BOOL=ON"  "-DENABLE_CHECK_FPE:BOOL=OFF" "-DENABLE_FLUSH_DENORMALS:BOOL=OFF"  "-DALBANY_ENABLE_FORTRAN:BOOL=OFF" "-DENABLE_SLFAD:BOOL=" "-GUnix  Makefiles" "/ascldap/users/ikalash/nightlyAlbanyCDash/Albany"
Configure Return Value: 1
Configure Output:
-- Enabled Kokkos devices: SERIAL CMake Warning at /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/Kokkos/KokkosConfig.cmake:176 (MESSAGE):   The installed Kokkos configuration does not support CXX extensions.   Forcing -DCMAKE_CXX_EXTENSIONS=Off Call Stack (most recent call first):   /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/Amesos2/Amesos2Config.cmake:157 (include)   /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/Stratimikos/StratimikosConfig.cmake:139 (include)   /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/Teko/TekoConfig.cmake:139 (include)   /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/NOX/NOXConfig.cmake:139 (include)   /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/Tempus/TempusConfig.cmake:139 (include)   /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/ROL/ROLConfig.cmake:139 (include)   /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/Piro/PiroConfig.cmake:139 (include)   /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/Trilinos/TrilinosConfig.cmake:116 (include)   CMakeLists.txt:35 (FIND_PACKAGE)   CMake Error at CMakeLists.txt:35 (FIND_PACKAGE):   Found package configuration file:      /ascldap/users/ikalash/nightlyAlbanyCDash/trilinos-install-serial-intel-release/lib/cmake/Trilinos/TrilinosConfig.cmake    but it set Trilinos_FOUND to FALSE so package "Trilinos" is considered to   be NOT FOUND.  Reason given by package:    The following imported targets are referenced, but are missing:   STKUtil::stk_util_util STKUtil::stk_util_parallel STKUtil::stk_util_env   STKUtil::stk_util_registry STKUtil::stk_util_diag   STKUtil::stk_util_command_line STKMath::stk_math STKTopology::stk_topology   STKMesh::stk_mesh_base STKIO::stk_io_util STKIO::stk_io   STKExprEval::stk_expreval    -- Configuring incomplete, errors occurred!

https://sems-cdash-son.sandia.gov/cdash/build/25764/configure

Is this due to recent changes to Trilinos?

@ikalash ikalash added type: bug The primary issue is a bug in Trilinos code or tests client: Albany Issue impacting the Albany project pkg: STK labels Nov 26, 2021
@alanw0
Copy link
Contributor

alanw0 commented Nov 29, 2021

Hi Irina, the last stk update in Trilinos was Nov 5: #9896 Stefan Domino is seeing a cmake issue also for nalu, I'll keep you posted if I figure out anything.

@bartlettroscoe
Copy link
Member

This may be related to the merge of PR #9894.

@ikalash, is there some way that I can reproduce this?

@spdomin
Copy link
Contributor

spdomin commented Nov 29, 2021

These missing targets all seem related to the above PR #9894. However, the discussion on that PR was really long.

Can we revert this commit, or do we want to make sure that we have a clear bisect first?

@bartlettroscoe
Copy link
Member

Can we revert this commit, or do we want to make sure that we have a clear bisect first?

@spdomin, I would not bother with a bisection. I can post PR that reverts the merge of #9894 and then I can try to reproduce the errors in this issues in #9972 and #9973 offline.

@spdomin
Copy link
Contributor

spdomin commented Nov 29, 2021

@bartlettroscoe, above sounds like a great approach. Let me know when the change is reverted and I can launch a re-test. Let me know offline if you have any issues with the Nalu build process (or add a Nalu issue - your choice). I would work for a gcc 8.4 open-mpi 4.0.5 build. It should be clear sailing.

@ikalash
Copy link
Contributor Author

ikalash commented Nov 29, 2021

I'm happy to retest Albany manually as well - just let me know.

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Nov 29, 2021
…/tribits-299-modern-cmake-targets-1"

This reverts commit db3205b, reversing
changes made to 110b6c4 which reverts the PR
merge trilinos#9894.

This is to allow reproducing and addressing the problems described in the new
issues trilinos#9972 and #trilinos#9973 offline to allow the Albany and Nalu Trilinos
integration process, respectively, to continue working in the meantime.
@bartlettroscoe
Copy link
Member

FYI: I posted the revert of PR #9894 in the new PR #9977. I just need someone to approve that PR so that it can merge. Hopefully the PR can pass testing fast and merge (but I don't have any control over that).

@ikalash, can you point me to detailed instructions on how to reproduce this failure with Albany + Trilinos?

@ikalash
Copy link
Contributor Author

ikalash commented Nov 29, 2021

@bartlettroscoe : great! I can approve if you add me as a reviewer. Yes, I can provide instructions for Albany. What is your machine of choice? CEE? Blake?

@bartlettroscoe
Copy link
Member

What is your machine of choice? CEE? Blake?

@ikalash, basic CEE RHEL7 machines would be best for me.

@ikalash
Copy link
Contributor Author

ikalash commented Nov 29, 2021

Here are instructions for CEE:

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Nov 29, 2021
…Trilinos/tribits-299-modern-cmake-targets-1""

This reverts commit fd27a20.

This gets us back to the state of the 'develop' branch after the PR trilinos#9894 that
merged the branch 'tribits-299-modern-cmake-targets-1' was merged (as well as
other PRs in the days after that).

Now I can try to reproduce the errors in issues trilinos#9972 and trilinos#9973.
trilinos-autotester added a commit that referenced this issue Nov 30, 2021
…299-modern-cmake-targets-1

Automatically Merged using Trilinos Pull Request AutoTester
PR Title: Revert: TriBITS: Pull in partial refactoring to modern CMake targets (TriBITSPub/TriBITS#299) (#9894, #9972, #9973)
PR Author: bartlettroscoe
@ikalash
Copy link
Contributor Author

ikalash commented Dec 1, 2021

The revert resolved this - closing.

@ikalash ikalash closed this as completed Dec 1, 2021
@bartlettroscoe
Copy link
Member

Reopening so we can track the reproduction of the errors here that will be fixed as part of PR #9978 ...

Hello @ikalash, I was not able to reproduce on a CEE RHEL7 machine. The configure of Trilinos fails. It can't find BLAS.

Specifically, what CEE machines do you use to do these builds as per above?


After setting up the repos with:

$ cd /scratch/rabartl/Albany.base/

$ git clone -o github [email protected]:sandialabs/Albany.git

$ cd Albany/

$ git log-short -1

0236196 "Add GitVersion header include to PyUtils cpp file"
Author: Luca Bertagna <[email protected]>
Date:   Wed Dec 1 11:06:17 2021 -0700 (6 hours ago)

$ cd ..

$ git clone -o github [email protected]:trilinos/Trilinos.git

$ cd Trilinos/

$ git remote add rab-github [email protected]:bartlettroscoe/Trilinos.git

$ git fetch rab-github

$ git checkout --track rab-github/tribits-299-modern-cmake-targets-1-again

$ git log-short -3

dd88fc9 "Revert "Revert "Merge Pull Request #9894 from bartlettroscoe/Trilinos/tribits-299-modern-cmake-targets-1"""
Author: Roscoe A. Bartlett <[email protected]>
Date:   Mon Nov 29 13:33:16 2021 -0700 (2 days ago)

fd27a20 "Revert "Merge Pull Request #9894 from bartlettroscoe/Trilinos/tribits-299-modern-cmake-targets-1""
Author: Roscoe A. Bartlett <[email protected]>
Date:   Mon Nov 29 12:30:53 2021 -0700 (2 days ago)

37c24f4 "Merge Pull Request #9918 from rppawlo/Trilinos/panzer-quad8-to-quad4"
Author: trilinos-autotester <[email protected]>
Date:   Mon Nov 29 09:48:40 2021 -0700 (2 days ago)

$ cd ..

Following the instructions above:

$ cd /scratch/rabartl/Albany.base/Trilinos/

$ mkdir build-sems-intel/

$ cd build-sems-intel/

$ ln -s /scratch/rabartl/Albany.base/Albany/doc/dashboards/cee-compute011.sandia.gov/sems-intel-modules.sh \
  load-env.sh

$ ln -s /scratch/rabartl/Albany.base/Albany/doc/dashboards/cee-compute011.sandia.gov/do-cmake-trilinos-mpi-sems-intel \
  do-configure

$ . load-env.sh
Currently Loaded Modulefiles:
  1) sems-env                         4) sems-openmpi/1.10.1              7) sems-hdf5/1.8.12/parallel
  2) sems-gcc/6.1.0                   5) cde/dev/cmake/3.19.2             8) sems-netcdf/4.4.1/exo_parallel
  3) sems-intel/19.0.5                6) sems-boost/1.55.0/base           9) sparc-tools/python/3.7.9

$ time ./do-configure &> configure.out

real    0m21.067s
user    0m7.271s
sys     0m5.925s

That failed with the following configure error:

Processing enabled TPL: BLAS (enabled by TeuchosNumerics, disable with -DTPL_ENABLE_BLAS=OFF)
-- BLAS_LIBRARY_NAMES='blas blas_win32'
-- Searching for libs in BLAS_LIBRARY_DIRS=''
-- Searching for a lib in the set "blas blas_win32":
--   Searching for lib 'blas' ...
--   Searching for lib 'blas_win32' ...
-- NOTE: Did not find a lib in the lib set "blas blas_win32" for the TPL 'BLAS'!
-- ERROR: Could not find the libraries for the TPL 'BLAS'!
-- TIP: If the TPL 'BLAS' is on your system then you can set:
     -DBLAS_LIBRARY_DIRS='<dir0>;<dir1>;...'
   to point to the directories where these libraries may be found.
   Or, just set:
     -DTPL_BLAS_LIBRARIES='<path-to-libs0>;<path-to-libs1>;...'
   to point to the full paths for the libraries which will
   bypass any search for libraries and these libraries will be used without
   question in the build.  (But this will result in a build-time error
   if not all of the necessary symbols are found.)
-- ERROR: Failed finding all of the parts of TPL 'BLAS' (see above), Aborting!

-- NOTE: The find module file for this failed TPL 'BLAS' is:
     /scratch/rabartl/Albany.base/Trilinos/cmake/tribits/common_tpls/FindTPLBLAS.cmake
   which is pointed to in the file:
     /scratch/rabartl/Albany.base/Trilinos/TPLsList.cmake

TIP: One way to get past the configure failure for the
TPL 'BLAS' is to simply disable it with:
  -DTPL_ENABLE_BLAS=OFF
which will disable it and will recursively disable all of the
downstream packages that have required dependencies on it, including
the package 'TeuchosNumerics' which triggered its enable.
When you reconfigure, just grep the cmake stdout for 'BLAS'
and then follow the disables that occur as a result to see what impact
this TPL disable has on the configuration of Trilinos.

CMake Error at cmake/tribits/core/package_arch/TribitsProcessEnabledTpl.cmake:156 (message):
  ERROR: TPL_BLAS_NOT_FOUND=TRUE, aborting!
Call Stack (most recent call first):
  cmake/tribits/core/package_arch/TribitsGlobalMacros.cmake:1572 (tribits_process_enabled_tpl)
  cmake/tribits/core/package_arch/TribitsProjectImpl.cmake:196 (tribits_process_enabled_tpls)
  cmake/tribits/core/package_arch/TribitsProject.cmake:93 (tribits_project_impl)
  CMakeLists.txt:109 (TRIBITS_PROJECT)

Note that my CEE RHEL7 machine does not have libblas.so or libblas.a installed.

@bartlettroscoe bartlettroscoe reopened this Dec 2, 2021
@ikalash
Copy link
Contributor Author

ikalash commented Dec 2, 2021

Weird about blas. I wonder if the paths are different on some of the compute nodes. Can you please try cee-compute020? That's where our CEE intel nightly build runs.

@bartlettroscoe
Copy link
Member

Can you please try cee-compute020?

Pretty much every other CEE RHEL7 machine seems to have libblas. Seems they just broke my one machine the last time they upgraded it I guess. (For some reason, BLAS and LAPACK are not part of the standard upgrades to systems.)

I have got the configure of Trilinos to pass so now I am on my way ...

@bartlettroscoe
Copy link
Member

@ikalash, I have been able to reproduce the Albany configure error:

-- Enabled Kokkos devices: SERIAL
CMake Warning at /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Kokkos/KokkosConfig.cmake:176 (MESSAGE):
  The installed Kokkos configuration does not support CXX extensions.
  Forcing -DCMAKE_CXX_EXTENSIONS=Off
Call Stack (most recent call first):
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Amesos2/Amesos2Config.cmake:157 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Stratimikos/StratimikosConfig.cmake:139 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Teko/TekoConfig.cmake:139 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/MueLu/MueLuConfig.cmake:140 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/TrilinosCouplings/TrilinosCouplingsConfig.cmake:139 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Trilinos/TrilinosConfig.cmake:116 (include)
  CMakeLists.txt:35 (FIND_PACKAGE)


CMake Error at CMakeLists.txt:35 (FIND_PACKAGE):
  Found package configuration file:

    /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Trilinos/TrilinosConfig.cmake

  but it set Trilinos_FOUND to FALSE so package "Trilinos" is considered to
  be NOT FOUND.  Reason given by package:

  The following imported targets are referenced, but are missing:
  STKUtil::stk_util_util STKUtil::stk_util_parallel STKUtil::stk_util_env
  STKUtil::stk_util_registry STKUtil::stk_util_diag
  STKUtil::stk_util_command_line STKMath::stk_math STKNGP_TEST::stk_ngp_test
  STKTopology::stk_topology STKMesh::stk_mesh_base STKIO::stk_io_util
  STKIO::stk_io STKUnit_test_utils::stk_mesh_fixtures
  STKUnit_test_utils::stk_unit_test_utils STKExprEval::stk_expreval

with details below.

I will get the bottom of what is happening. Thanks!

Reproduction Details

.

I am trying this again on my CEE EWS machine 'ews00232' after copying the above git repos:

$ cd /scratch/rabartl/Albany.base/Trilinos

$ mkdir build-sems-intel/

$ cd build-sems-intel/

$ . load-env.sh 
Currently Loaded Modulefiles:
  1) sems-env                         4) sems-openmpi/1.10.1              7) sems-hdf5/1.8.12/parallel
  2) sems-gcc/6.1.0                   5) cde/dev/cmake/3.19.2             8) sems-netcdf/4.4.1/exo_parallel
  3) sems-intel/19.0.5                6) sems-boost/1.55.0/base           9) sparc-tools/python/3.7.9

$ ./do-configure &> configure.out

$ make -j16 &> make.out

$ make -j16 &> make.install.out

$ cd /scratch/rabartl/Albany.base/Albany/

$ mkdir build-sems-intel/

$ cd build-sems-intel/

$ emacs -nw do-configure  # Copy in provided script and edit

$ env TRILINSTALLDIR=/scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install ./do-configure &> configure.out

That produced the error:

-- Enabled Kokkos devices: SERIAL
CMake Warning at /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Kokkos/KokkosConfig.cmake:176 (MESSAGE):
  The installed Kokkos configuration does not support CXX extensions.
  Forcing -DCMAKE_CXX_EXTENSIONS=Off
Call Stack (most recent call first):
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Amesos2/Amesos2Config.cmake:157 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Stratimikos/StratimikosConfig.cmake:139 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Teko/TekoConfig.cmake:139 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/MueLu/MueLuConfig.cmake:140 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/TrilinosCouplings/TrilinosCouplingsConfig.cmake:139 (include)
  /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Trilinos/TrilinosConfig.cmake:116 (include)
  CMakeLists.txt:35 (FIND_PACKAGE)


CMake Error at CMakeLists.txt:35 (FIND_PACKAGE):
  Found package configuration file:

    /scratch/rabartl/Albany.base/Trilinos/build-sems-intel/install/lib/cmake/Trilinos/TrilinosConfig.cmake

  but it set Trilinos_FOUND to FALSE so package "Trilinos" is considered to
  be NOT FOUND.  Reason given by package:

  The following imported targets are referenced, but are missing:
  STKUtil::stk_util_util STKUtil::stk_util_parallel STKUtil::stk_util_env
  STKUtil::stk_util_registry STKUtil::stk_util_diag
  STKUtil::stk_util_command_line STKMath::stk_math STKNGP_TEST::stk_ngp_test
  STKTopology::stk_topology STKMesh::stk_mesh_base STKIO::stk_io_util
  STKIO::stk_io STKUnit_test_utils::stk_mesh_fixtures
  STKUnit_test_utils::stk_unit_test_utils STKExprEval::stk_expreval



-- Configuring incomplete, errors occurred!

The script /scratch/rabartl/Albany.base/Albany/build-sems-intel/do-configure was:

if [[ -e CMakeCache.txt ]] ; then 
  rm -r CMake*
fi

cmake \
      -D ALBANY_TRILINOS_DIR:FILEPATH="$TRILINSTALLDIR" \
      -D CMAKE_BUILD_TYPE:STRING=RELEASE \
      -D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
      -D ENABLE_LCM:BOOL=ON \
      -D ENABLE_MOR:BOOL=OFF \
      -D ENABLE_HYDRIDE:BOOL=OFF \
      -D ENABLE_AMP:BOOL=OFF \
      -D ENABLE_ALBANY_EPETRA_EXE=ON \
      -D ENABLE_ATO:BOOL=OFF \
      -D ENABLE_CHECK_FPE:BOOL=ON \
      -D ENABLE_SCOREC:BOOL=OFF \
      -D ENABLE_QCAD:BOOL=OFF \
      -D ENABLE_SG_MP:BOOL=OFF \
      -D ENABLE_ASCR:BOOL=OFF \
      -D ENABLE_AERAS:BOOL=OFF \
      -D ENABLE_64BIT_INT:BOOL=OFF \
      -D ENABLE_INSTALL:BOOL=OFF \
      -D CMAKE_INSTALL_PREFIX:PATH=${PWD}/install \
      -D ENABLE_DEMO_PDES:BOOL=ON \
      -D ENABLE_MPAS_INTERFACE:BOOL=OFF \
      ..

bartlettroscoe added a commit to bartlettroscoe/TriBITS that referenced this issue Dec 2, 2021
…age includes (TriBITSPub#299)

This is the use case that triggers trilinos/Trilinos#9972 and
trilinos/Trilinos#9973.

Now I will change the code to fix the test.
bartlettroscoe added a commit to bartlettroscoe/TriBITS that referenced this issue Dec 2, 2021
…age includes (TriBITSPub#299)

This sets <ParentPackage>_ENABLE_<SubPackage>=ON if the subpackage is enabled
even if optional packages are disabled.  This will fix trilinos/Trilinos#9972
and trilinos/Trilinos#9973.
bartlettroscoe added a commit to bartlettroscoe/TriBITS that referenced this issue Dec 2, 2021
…age includes (TriBITSPub#299)

This is the use case that triggers trilinos/Trilinos#9972 and
trilinos/Trilinos#9973.

Now I will change the code to fix the test.
bartlettroscoe added a commit to bartlettroscoe/TriBITS that referenced this issue Dec 2, 2021
…age includes (TriBITSPub#299)

This sets <ParentPackage>_ENABLE_<SubPackage>=ON if the subpackage is enabled
even if optional packages are disabled.  This will fix trilinos/Trilinos#9972
and trilinos/Trilinos#9973.

This also updates the logic that generates <Package>Config.cmake files to only
include <UpstreamPackage>Config.cmake files for direct dependencies, not all
dependencies.  (The indirect includes should take care of the rest.)
@bartlettroscoe
Copy link
Member

@ikalash, to be more specific ...

Do you have a simple setup to support configuration, building, and installation of Trilinos for Albany locally, running the native Trilinos tests for the enabled packages used by Albany, and then configuring, building, and testing Albany against that local Trilinos install?

This should be just a few commands like for Trilinos that looks something like:

$ mkdir  <tril-build-dir>/
$ cd <tril-build-dir>/
$ source <albany-env-load-script>
$ env TRILINOS_SRC_DIR=<tril-src-dir> \
    <trilinos-configure-script> -DTrilinos_ENABLE_TESTS=ON -DCMAKE_INSTALL_PREFIX=${PWD}/install
$ make -j12
$ ctest -j12
$ make -j12 install

and then configure and build Albany against that Trilinos install like:

$ mkdir <albany-build- dir>/
$ cd <albany-build- dir>/
$ env ALBANY_SRC_DIR=<albany-src-dir> \
    <albany-configure-script> -DCMAKE_PREFIX_PATH=<tril-build-dir>/install
$ make -j12
$ ctest -j12

where the files <albany-env-load-script>, <trilinos-configure-script>, and <albany-configure-script> are all under version control and the build and source dir locations <tril-src-dir>, <tril-build-dir>, <albany-src-dir>, <albany-build-dir> are arbitrary?

This is really what I (and other people) need to be able to create a working baseline Albany (or any application code) to test against Trilinos. And if we can do this on a standard CEE RHEL7 machine, then that makes it easy for anyone to do such reproductions since everyone should access to a CEE LAN machine. (They stood up the new HPWS machines which looks to be pretty good so far.) If you have this (and someone maintains this working at all times), then that eliminates overhead of testing an APP against a local Trilinos git repo.

I can perhaps help to set this up if you are interested and get this under version control so that it will be kept up-to-date.

bartlettroscoe added a commit to bartlettroscoe/TriBITS that referenced this issue Dec 9, 2021
…n a parent package (TriBITSPub#299)

This is the use case exercised by Albany/Trilinos in trilinos/Trilinos#9972
(and likely also Nalu/Trilinos in trilinos/Trilinos#9973).  This test shows
the same configure error where <Project>Config.cmake is trying to include an
<SubPackage>Config.cmake file for a required subpackage that is not actually
enabled.
bartlettroscoe added a commit to bartlettroscoe/TriBITS that referenced this issue Dec 9, 2021
…reamPackage> (TriBITSPub#299)

This is needed in cases where a top-level package has multiple required
subpackages but where the user only requests a subset of the required
subpackages be enabled and not the top-level package itself.  This is one of
the use cases exersized by Albany/Trilinos (see trilinos/Trilinos#9972).

This commit fixes the failing test TribitsExampleApp_EnableSingleSubpackage.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Dec 15, 2021
…Trilinos/tribits-299-modern-cmake-targets-1""

This reverts commit fd27a20.

This gets us back to the state of the 'develop' branch after the PR trilinos#9894 that
merged the branch 'tribits-299-modern-cmake-targets-1' was merged (as well as
other PRs in the days after that).

Now I can try to reproduce the errors in issues trilinos#9972 and trilinos#9973.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Dec 15, 2021
…targets-1-again (TriBITSPub/TriBITS#433)

Should address all of the issues with the merge of PR trilinos#9894
listed out in TriBITSPub/TriBITS#433 (which is part of
TriBITSPub/TriBITS#299).  This should resolve the failures reported in
trilinos#9972 and trilinos#9973.
@bartlettroscoe
Copy link
Member

FYI: So far, I have been unable to get a working reference build of Trilinos+Albany. But I have gotten through the configure of Albany and I believe the remaining issues are unrelated to Trilinos.

But if someone on the Albany team would like to test the next update of TriBITS to Trilinos in PR #9978, they can access the tip of the branch in their local Trilinos repo as:

$ cd Trilinos/
$ git remote add bartlettroscoe [email protected]:bartlettroscoe/Trilinos.git
$ git fetch bartlettroscoe
$ git checkout --track bartlettroscoe/tribits-299-modern-cmake-targets-1-again

Otherwise, I am inclined to merge PR #9978 and see how it goes.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 16, 2021

I will test it now.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 16, 2021

I'm getting the following error when building Albany:

[  1%] Built target HeatProfile
[  1%] Built target CylHeatProfile
[  1%] Linking CXX executable xml2yaml
[  2%] Linking CXX executable yaml2xml
/usr/bin/ld: CMakeFiles/yaml2xml.dir/utility/yaml2xml.cpp.o: undefined reference to symbol '_ZN7Teuchos19ActiveRCPNodesSetupC1Ev'
/projects/albany/nightlyAlbanyCDash/test/TrilinosInstall/lib/libteuchoscore.so.13: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [src/yaml2xml] Error 1
/usr/bin/ld: CMakeFiles/xml2yaml.dir/utility/xml2yaml.cpp.o: undefined reference to symbol '_ZN7Teuchos19ActiveRCPNodesSetupC1Ev'
/projects/albany/nightlyAlbanyCDash/test/TrilinosInstall/lib/libteuchoscore.so.13: error adding symbols: DSO missing from command line
make[1]: *** [src/CMakeFiles/yaml2xml.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....collect2: error: ld returned 1 exit status

make[2]: *** [src/xml2yaml] Error 1
make[1]: *** [src/CMakeFiles/xml2yaml.dir/all] Error 2
[ 53%] Built target albanyLib
make: *** [all] Error 2

Were changes to Teuchos made? I did not build Albany from scratch so perhaps this is an artifact of that and not due to these changes?

@bartlettroscoe
Copy link
Member

@ikalash, I got those same errors as described above. Do your build and install of Trilinos from 'develop' then building Albany pass?

@ikalash
Copy link
Contributor Author

ikalash commented Dec 16, 2021

Yes we use the develop branch. The Albany nightly build on the same machine with develop was clean last night.

@bartlettroscoe
Copy link
Member

Yes we use the develop branch. The Albany nightly build on the same machine with develop was clean last night.

@ikalash, what machine are you doing this on? (You can contact my offline if that is a sensitive question.)

@ikalash
Copy link
Contributor Author

ikalash commented Dec 16, 2021

I tested your branch on cee-compute021 with a gcc compiler.

@bartlettroscoe bartlettroscoe changed the title STK: Albany configure errors due to missing targets Albany configure errors due to TriBITS update on 11/25/2021 Dec 24, 2021
@bartlettroscoe
Copy link
Member

@ikalash, I have been unable to get a reference configuration for Trilinos and Albany to work on a cee-buildxyz machine. I tried following the instructions provided but I am getting an Albany build error:

FAILED: src/CMakeFiles/albanyLib.dir/Albany_Application.cpp.o 
/projects/sems/install/rhel7-x86_64/sems/compiler/intel/19.0.5/openmpi/1.10.1/bin/mpicxx -DALBANY_STK_EXPR_EVAL -DalbanyLib_EXPORTS -Isrc -I/scratch/rabartl/Albany.base.ref/Albany/src -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/bc -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/gather -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/interpolation -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/pde -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/response -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/scatter -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/state -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/utility -I/scratch/rabartl/Albany.base.ref/Albany/src/problems -I/scratch/rabartl/Albany.base.ref/Albany/src/responses -I/scratch/rabartl/Albany.base.ref/Albany/src/disc/stk -I/scratch/rabartl/Albany.base.ref/Albany/src/disc -I/scratch/rabartl/Albany.base.ref/Albany/src/utility -isystem /scratch/rabartl/Albany.base.ref/BUILD/cee-rhel7/Trilinos/install/include -isystem /projects/sems/install/rhel7-x86_64/sems/tpl/boost/1.55.0/intel/19.0.5/base/include -isystem /projects/sems/install/rhel7-x86_64/sems/tpl/netcdf/4.4.1/intel/19.0.5/openmpi/1.10.1/exo_parallel/include -O3 -DNDEBUG  -O3 -DNDEBUG -fPIC -MD -MT src/CMakeFiles/albanyLib.dir/Albany_Application.cpp.o -MF src/CMakeFiles/albanyLib.dir/Albany_Application.cpp.o.d -o src/CMakeFiles/albanyLib.dir/Albany_Application.cpp.o -c /scratch/rabartl/Albany.base.ref/Albany/src/Albany_Application.cpp
/scratch/rabartl/Albany.base.ref/Albany/src/Albany_Application.cpp(36): catastrophic error: cannot open source file "Zoltan2_TpetraCrsColorer.hpp"
  #include "Zoltan2_TpetraCrsColorer.hpp"
                                         ^

compilation aborted for /scratch/rabartl/Albany.base.ref/Albany/src/Albany_Application.cpp (code 4)

It seems that Albany is expecting that Zoltan2 is enabled and installed but that is not the case with the configure script that I was given. (My guess is that the Albany Trilinos configuration is expecting the SCOREC package to be present and enabled which triggers the enable of Zoltan2 but I don't know where to get that package from a repo and if so what version to use.)

I can give exact command-by-command reproducibility instructions with some scripts I have checked into an internal repo (albany_trilinos_build_scripts). Then perhaps we can update those instructions and scripts to allow for easy reproducability? Should we open an internal GitLab or JIRA issue to track this?

If I can't reproduce a reference build of Albany + Trilinos then I can't debug anything related to Albany.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 24, 2021

I can't access the repo you linked. Are you using these scripts?

https://github.com/sandialabs/Albany/blob/master/doc/dashboards/cee-compute011.sandia.gov/sems-intel-modules.sh
https://github.com/sandialabs/Albany/blob/master/doc/dashboards/cee-compute011.sandia.gov/do-cmake-trilinos-mpi-sems-intel

SCOREC is not required for Albany, so that is not the issue.

I have an idea of what is the problem actually if you're using the right scripts. What compute-node are you running on? I've seen weird Albany errors on some of the cee nodes/machines. Could you please try building on cee-compute020, where the nightlies run?

@bartlettroscoe
Copy link
Member

@ikalash, I (hopefully) opened up the repo and provided full reproducibility instructions in:

Please give that a try on any CEE RHEL7 machine. (I used 'cee-build030' which is a new super fast machine.)

If you see anything that needs improved in those scripts, please post a Merge Request against 'master' in the repo:

I will post an issue there with the details of my reproducibility attempt and what I am seeing.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 24, 2021

I think cee-build030 was one of the machines where other users have seen build issues. Could you please try on cee-compute020 just as a sanity check? I can try your scripts as well.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 24, 2021

@bartlettroscoe : I just tried your scripts on cee-compute030. Here is what I found:

  • If I follow the workflow in the README.md file, using ninja, I reproduce the error you report.
  • If, however, I just use the do-cmake scripts you provide to build the code "manually" (./do-cmake; make -j32), the code builds.

Could you please verify that you see the same behavior? I'm not sure what is broken in the ninja build case... I don't typically use it.

@bartlettroscoe
Copy link
Member

I just tried your scripts on cee-compute030

@ikalash, what exact versions of Trilinos and Albany did you use (i.e. the exact SHA1s) and what test results did you get for Albany when you ran the Albany test suite?

@ikalash
Copy link
Contributor Author

ikalash commented Jan 7, 2022

Sorry about the delay - one of my projects has a review next week and I've been busy preparing for that. Here are the SHAs:

Trilinos: 7b1fc5a
Albany: 54329a173f8fe5c0c9b2575d4495f6e028193333

It turns out on cee-compute030, the code does not run. Here is the error:

134: Test command: /projects/sems/install/rhel7-x86_64/sems/compiler/intel/19.0.5/openmpi/1.10.1/bin/mpiexec "-np" "1" "/scratch/ikalash/Albany/build2/tests/unit/evaluators/scatterResidual_unit_tester"
134: Test timeout computed to be: 1500
134: /projects/sems/install/rhel7-x86_64/sems/compiler/intel/19.0.5/openmpi/1.10.1/bin/mpiexec: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
134/138 Test #134: Albany_Serial_ScatterResidual_Unit_Test ................................***Failed    0.00 sec

I haven't seen this before but when I googled it, it looks like it came up in Trilinos awhile back.

I repeated the build on cee-compute021, and there the tests pass. This is consistent w/ what I remember some Albany customers reported awhile back (the code did not build/run correctly on some of the newer cee-compute nodes).

@bartlettroscoe
Copy link
Member

I repeated the build on cee-compute021, and there the tests pass.

@ikalash, what specific command did you run and how many tests ran and passed? One way to get a nice summary of tests run with ctest is:

$ ctest -j 12 &> ctest.out

$ grep -A 500  "failed out of" ctest.out

That will print the summary of the number of tests run, tests passed, tests failed, and list the tests that failed.

@bartlettroscoe
Copy link
Member

@ikalash, what do you mean by:

If, however, I just use the do-cmake scripts you provide to build the code "manually" (./do-cmake; make -j32), the code builds.

? What do-cmake script are you referring to specifically and what is meant by building the code "manually"?

Using the instructions at:

but instead switching to configuring Albany with Makefiles instead of Ninja with:

$ env \
    ALBANY_SOURCE_DIR=<albany-src-dir> \
    TRILINOS_INSTALL_DIR=<tril-build-dir>/install \
  ./do-configure

seems to made no difference for me. I still got the Albany build error:

[ 11%] Building CXX object src/CMakeFiles/albanyLib.dir/SolutionManager.cpp.o
cd /scratch/rabartl/Albany.base.ref/BUILDS/cee-rhel7/Albany/src && /projects/sems/install/rhel7-x86_64/sems/compiler/intel/19.0.5/openmpi/1.10.1/bin/mpicxx -DALBANY_STK_EXPR_EVAL -DalbanyLib_EXPORTS -I/scratch/rabartl/Albany.base.ref/BUILDS/cee-rhel7/Albany/src -I/scratch/rabartl/Albany.base.ref/Albany/src -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/bc -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/gather -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/interpolation -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/pde -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/response -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/scatter -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/state -I/scratch/rabartl/Albany.base.ref/Albany/src/evaluators/utility -I/scratch/rabartl/Albany.base.ref/Albany/src/problems -I/scratch/rabartl/Albany.base.ref/Albany/src/responses -I/scratch/rabartl/Albany.base.ref/Albany/src/disc/stk -I/scratch/rabartl/Albany.base.ref/Albany/src/disc -I/scratch/rabartl/Albany.base.ref/Albany/src/utility -isystem /scratch/rabartl/Albany.base.ref/BUILDS/cee-rhel7/Trilinos/install/include -isystem /projects/sems/install/rhel7-x86_64/sems/tpl/boost/1.55.0/intel/19.0.5/base/include -isystem /projects/sems/install/rhel7-x86_64/sems/tpl/netcdf/4.4.1/intel/19.0.5/openmpi/1.10.1/exo_parallel/include -O3 -DNDEBUG  -O3 -DNDEBUG -fPIC -o CMakeFiles/albanyLib.dir/SolutionManager.cpp.o -c /scratch/rabartl/Albany.base.ref/Albany/src/SolutionManager.cpp
/scratch/rabartl/Albany.base.ref/Albany/src/Albany_Application.cpp(36): catastrophic error: cannot open source file "Zoltan2_TpetraCrsColorer.hpp"
  #include "Zoltan2_TpetraCrsColorer.hpp"
                                         ^

compilation aborted for /scratch/rabartl/Albany.base.ref/Albany/src/Albany_Application.cpp (code 4)
make[2]: *** [src/CMakeFiles/albanyLib.dir/Albany_Application.cpp.o] Error 4
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory `/scratch/rabartl/Albany.base.ref/BUILDS/cee-rhel7/Albany'
make[1]: *** [src/CMakeFiles/albanyLib.dir/all] Error 2
make[1]: Leaving directory `/scratch/rabartl/Albany.base.ref/BUILDS/cee-rhel7/Albany'
make: *** [all] Error 2

As I said above, I think the problem may be due to the SCOREC package not being present when Trilinos is configured, built, and installed (which would therefore enable Zoltan2).

Can you please create an issue in the internal repo:

that gives exact unambiguous reproducibility instructions? Otherwise, can we briefly pair program together so I can examine your source and build directories to see exactly what is going on?

If I can just get a single successful reference build of Albany + Trilinos 'develop' working, then I should be off to the races. Otherwise, it will be impossible for me to debug any issues with future updates of TriBITS against Albany.

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Jan 11, 2022

@ikalash, after fixing the typo on the Trilinos configure script we found, I now have a successful reference build and test of Albany + Trilinos on 'cee-compute021' as described in complete detail in https://cee-gitlab.sandia.gov/rabartl/albany_trilinos_build_scripts/-/issues/1#note_2115651 that produced the Albany test results:

$ time ctest -j15 &> ctest.out

real    4m8.374s
user    74m43.785s
sys     8m14.306s

$ grep -A 100 "failed out of" ctest.out

100% tests passed, 0 tests failed out of 137

Label Time Summary:
Adjoint        = 385.34 sec*proc (8 tests)
Analysis       = 103.98 sec*proc (4 tests)
Basic          = 824.24 sec*proc (75 tests)
Demo           = 1337.19 sec*proc (46 tests)
Epetra         = 752.90 sec*proc (53 tests)
Forward        = 1672.10 sec*proc (109 tests)
ROL            =  65.39 sec*proc (3 tests)
RegressFail    =  62.76 sec*proc (3 tests)
Serial         = 624.92 sec*proc (19 tests)
Tempus         =  16.34 sec*proc (1 test)
Tpetra         = 1408.53 sec*proc (68 tests)

Total Test time (real) = 248.36 sec

I should now be able to debug problems with the updated Trilinos build system in PR #9978.

@ikalash
Copy link
Contributor Author

ikalash commented Jan 11, 2022

@bartlettroscoe great! Let me know if you have any more questions about Albany.

bartlettroscoe added a commit to bartlettroscoe/Albany that referenced this issue Jan 12, 2022
…eterList

This Albany CMakeLists.txt was making assumptions about the implentation
details of the TriBITS-generated <Package>Config.cmake files that it should
not have been making.  It was assuming that the raw TriBITS target
'teuchosparameterlist' existed which is a no-no.  The correct old-school
TriBITS usage is ${TeuchosParameterList_LIBRARIES} and the associated include
directories.

This works with old TriBITS and refactored TriBITS.

This is related to trilinos/Trilinos#9972 and PR trilinos/Trilinos#9978.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 10, 2022
…Trilinos/tribits-299-modern-cmake-targets-1""

This reverts commit fd27a20.

This gets us back to the state of the 'develop' branch after the PR trilinos#9894 that
merged the branch 'tribits-299-modern-cmake-targets-1' was merged (as well as
other PRs in the days after that).

Now I can try to reproduce the errors in issues trilinos#9972 and trilinos#9973.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: Albany Issue impacting the Albany project pkg: STK type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

4 participants