MueLu research and example code build failures for new CUDA ATDM build on hansen/shiller #2319

bartlettroscoe · 2018-03-02T00:34:09Z

CC: @trilinos/muelu, @fryeguy52

Next Action Status:

Commit eee871d which sets MueLu_ENABLE_Epertra=OFF and fixes the build failures.

Description

The MueLu package shows build falures for the CUDA ATDM builds today on hansen shown at:

https://testing.sandia.gov/cdash/index.php?project=Trilinos&filtercount=1&showfilters=1&field1=buildname&compare1=63&value1=-atdm-

for the builds:

Trilinos-atdm-hansen-shiller-cuda-debug: https://testing.sandia.gov/cdash/index.php?project=Trilinos&parentid=3412693
Trilinos-atdm-hansen-shiller-cuda-opt: https://testing.sandia.gov/cdash/index.php?project=Trilinos&parentid=3412702

The build failures for example at:

https://testing.sandia.gov/cdash/viewBuildError.php?buildid=3412805

all show undefined reference link failues like:

CMakeFiles/MueLu_ImportTest.dir/Import.cpp.o: In function `int main_<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >(Teuchos::CommandLineProcessor&, Xpetra::UnderlyingLib, int, char**)':
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:153: undefined reference to `Tpetra::Map<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::Map(unsigned long, Teuchos::ArrayView<int const> const&, int, Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const&)'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:154: undefined reference to `Tpetra::Map<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::Map(unsigned long, Teuchos::ArrayView<int const> const&, int, Teuchos::RCP<Teuchos::Comm<int> const> const&, Teuchos::RCP<Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const&)'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:156: undefined reference to `Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::MultiVector(Teuchos::RCP<Tpetra::Map<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const&, unsigned long, bool)'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:157: undefined reference to `Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::getDataNonConst(unsigned long)'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:167: undefined reference to `Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::MultiVector(Teuchos::RCP<Tpetra::Map<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const&, unsigned long, bool)'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:169: undefined reference to `Tpetra::Export<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::Export(Teuchos::RCP<Tpetra::Map<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Tpetra::Map<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const&)'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:170: undefined reference to `Tpetra::DistObject<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::doExport(Tpetra::SrcDistObject const&, Tpetra::Export<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const&, Tpetra::CombineMode)'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:171: undefined reference to `Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::getData(unsigned long) const'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:169: undefined reference to `Tpetra::Export<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::~Export()'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:167: undefined reference to `Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::~MultiVector()'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:156: undefined reference to `Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::~MultiVector()'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:169: undefined reference to `Tpetra::Export<int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::~Export()'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:167: undefined reference to `Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::~MultiVector()'
/home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-debug/SRC_AND_BUILD/Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp:156: undefined reference to `Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::~MultiVector()'
collect2: error: ld returned 1 exit status

but each executable has a slightly different set of link failures.

It looks like some explicit template instantiations are missing?

Steps to Reproduce:

The instructions to reproduce these build failures can be found starting at:

https://snl-wiki.sandia.gov/display/CoodinatedDevOpsATDM/ATDM+Builds+of+Trilinos

and clicking "Reproducing ATDM builds locally" which takes you to:

https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md

Basically, on hansen or shiller, you just clone the Trilinos repo (with location depicted as $TRILINOS_DIR below), get on the develop branch. Then create a build directory and do the configure and build as:

$ cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-opt

$ cmake \
  -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
  -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
  $TRILINOS_DIR

$ make -j16

The text was updated successfully, but these errors were encountered:

jhux2 · 2018-03-02T15:55:55Z

MueLu's research code directory should not be enabled by default. This is likely caused by insufficient cmake guards.

bartlettroscoe · 2018-03-02T15:59:10Z

MueLu's research code directory should not be enabled by default. This is likely caused by insufficient cmake guards.

@jhux2,

Okay, just let me know what tweaks are needed to build just the tests and examples you want.

But note that EMPIRE is setting MueLu_ENABLE_Experimental=ON so if there are tests in MueLu needed to test and support that functionality, then they should be built and run the various ATDM platforms. If you think that ATDM apps should not be using MueLu_ENABLE_Experimental=ON, then that is a conversation that you need to have with them. But as long as they are using it, it needs to be supported and kept stable like any other piece of code in Trilinos that are using.

mayrmt · 2018-03-02T16:49:21Z

I checked Trilinos/packages/muelu/research/luc/region_algorithms/Import.cpp accidentally. We can just revert the commit 506af3b since we don't need this anyways.

mayrmt · 2018-03-02T16:51:15Z

@bartlettroscoe How do I best revert this commit?

Reverting it locally and the pushing to Trilinos/develop with the checkin-script?
Reverting it locally and issuing a pull request?

bartlettroscoe · 2018-03-02T16:59:22Z

How do I best revert this commit?

Did you push this to 'develop' yet? If you only committed it locally, then you can remove it in various ways locally as described:

https://blog.github.com/2015-06-08-how-to-undo-almost-anything-with-git/

Otherwise, send me email and we can converse there.

mayrmt · 2018-03-03T19:16:32Z

Fix has been merged via PR #2326. Not closing yet to make sure that this actually cured the problem.

@bartlettroscoe Can you confirm that this is resolved and then close this issue? Thank you!

bartlettroscoe · 2018-03-03T22:55:45Z

Fix has been merged via PR #2326. Not closing yet to make sure that this actually cured the problem.

@mayrmt,

Given the six link failures that were shown on CDash, I don't think this one PR #2326 will fix all of them but I sure hope I am wrong :-)

@bartlettroscoe Can you confirm that this is resolved and then close this issue? Thank you!

We will take a look at the ATDM builds dashboard tomorrow and that will be the telling. I am looking over the those builds every day as we get them cleaned up so if the problem goes away, I will close this issue.

Thanks!

bartlettroscoe · 2018-03-05T13:52:01Z

@mayrmt,

It looks like your new commit 7a945e6 that was pushed on Saturday and pulled on Sunday as shown at:

https://testing.sandia.gov/cdash/viewNotes.php?buildid=3417530##note5

removed the build error for the file packages/muelu/research/luc/region_algorithms/Import.cpp but the other link errors remain as shown in automated testing at:

Can someone take a look at these? I think anyone with access to SRN or SON machines shiller or hansen can reproduce these failures as described in the "Steps to Reproduce" above.

bartlettroscoe · 2018-03-05T13:56:37Z

Also note that even if you exclude the 8 "Not Run" tests for the missing executables that would not link, there are still 23 failing MueLu tests for these CUDA builds. I was going to wait until these build failures were fixed before posting new GitHub issues for those failures but some MueLu developer might want to look into those too. Those can be see at:

mhoemmen · 2018-03-05T17:05:20Z

If folks are gone this week at SIAM PP, I can help, just not today so much.

jhux2 · 2018-03-05T17:56:02Z

@bartlettroscoe I'm not surprised at these failures, as MueLu has an experimental track CUDA build that hasn't been clean for quite a while. Could you temporarily disable these for the ATDM build until the MueLu team has time to look at these? (I am facing a couple March conference/milestone deadlines.)

bartlettroscoe · 2018-03-06T22:44:08Z

I'm not surprised at these failures, as MueLu has an experimental track CUDA build that hasn't been clean for quite a while. Could you temporarily disable these for the ATDM build until the MueLu team has time to look at these? (I am facing a couple March conference/milestone deadlines.)

@jhux2, okay, I will disable is little as I can to make everything pass. I will also disable the failing tests as well. Then someone on the MueLu team can log onto the shiller or hansen and work out all of these issues when they have time.

bartlettroscoe · 2018-03-13T18:34:49Z

@bathmatt, @jhux2, @mhoemmen, and @srajama1,

I was wrong, I did create a GitHub issue for these MueLu build failures. Could this be related to the build failures that @bathmatt reported for EMPIRE? It looks like some explicit instantiations are missing.

jhux2 · 2018-03-13T21:47:15Z

I was wrong, I did create a GitHub issue for these MueLu build failures. Could this be related to the build failures that @bathmatt reported for EMPIRE? It looks like some explicit instantiations are missing.

@bartlettroscoe Yes, in fact the very error @bathmatt reported also appears on the dashboard.

By the way, I see this output during the configure process

NOTE: Kokkos::Serial is ON (the CMake option Kokkos_ENABLE_Serial is ON), but the corresponding Tpetra Node type is disabled.  If you want to enable instantiation and use of Kokkos::Serial in Tpetra, please also set the CMake option Tpetra_INST_SERIAL:BOOL=ON.  If you use the Kokkos::Serial Node type in Tpetra without doing this, you will get link errors!
-- Tpetra execution space availability (ON means available): 
--   - Serial:  OFF
--   - Threads: OFF
--   - OpenMP:  OFF
--   - Cuda:    ON

The link errors refer to symbols templated on Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, so the cmake message might be relevant.

bartlettroscoe · 2018-03-13T21:56:13Z

By the way, I see this output during the configure process
NOTE: Kokkos::Serial is ON (the CMake option Kokkos_ENABLE_Serial is ON), but the corresponding Tpetra Node type is disabled.  If you want to enable instantiation and use of Kokkos::Serial in Tpetra, please also set the CMake option Tpetra_INST_SERIAL:BOOL=ON.  If you use the Kokkos::Serial Node type in Tpetra without doing this, you will get link errors!
-- Tpetra execution space availability (ON means available): 
--   - Serial:  OFF
--   - Threads: OFF
--   - OpenMP:  OFF
--   - Cuda:    ON
The link errors refer to symbols templated on Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, so the cmake message might be relevant.

@trilinos/tpetra developers,

Can we set up Tpetra to automatically enable Tpetra_INST_SERIAL:BOOL=ON by default instead of just printing out this warning message? In general, we need to be looking for ways to make it easier for users to configure Trilinos correctly by just setting obvious options.

Otherwise, I will try enabling this option for the CUDA build on hansen/shiller and see if this makes the link errors go away.

jhux2 · 2018-03-13T22:00:19Z

@bartlettroscoe What bothers me is that MueLu should not require that node type. Enabling Tpetra_INST_SERIAL:BOOL=ON might fix this error, but doesn't address the underlying problem. (The build time and executable size will go up, too.)

bartlettroscoe · 2018-03-13T22:09:15Z

What bothers me is that MueLu should not require that node type. Enabling Tpetra_INST_SERIAL:BOOL=ON might fix this error, but doesn't address the underlying problem. (The build time and executable size will go up, too.)

@jhux2,

Somehow this is getting manifested in the MueLu test suite itself so one only needs to look at how MueLu is using upstream packages and what those packages are doing. If you look at the link failure, for example, today at:

https://testing-vm.sandia.gov/cdash/viewBuildError.php?buildid=3365789

you see undefined reference errors coming directly from MueLu source files as well as from Tpetra files and other packages.

Can some MueLu developer log into to 'hanen' or 'shiller' and see if they can reproduce these link failures as described above? It should only take a few minutes of person-time to get the build going.

Otherwise, I will let you know what happens with setting Tpetra_INST_SERIAL:BOOL=ON for these CUDA builds.

jhux2 · 2018-03-13T22:13:23Z

I'm on shiller now, diagnosing the configure process. I believe that's where the error is.

jhux2 · 2018-03-14T00:47:58Z

@bartlettroscoe I tested two options:

-DTpetra_ENABLE_Epetra:BOOL=OFF. This fixed the error. However, EMPIRE requires Panzer, which requires Epetra, so this is not a workable option.
-DTpetra_INST_SERIAL:BOOL=ON. This fixed the error, at the cost of increasing the build time and probably any executables.

For now, can you enable option 2? Longer term, we may be able to relax this requirement in MueLu.

mhoemmen · 2018-03-14T00:57:50Z

@bartlettroscoe wrote:

Can we set up Tpetra to automatically enable Tpetra_INST_SERIAL:BOOL=ON by default instead of just printing out this warning message? In general, we need to be looking for ways to make it easier for users to configure Trilinos correctly by just setting obvious options.

This used to be ON by default, but that changed as part of Kokkos' refactor of CMake / Makefile options. Changing it back would increase the build time quite a bit. Is the problem that downstream packages don't do ETI correctly? That's an issue very much like #74 in that what looks like an easy CMake option to set, actually increases the build time a lot and does not help generality.

bartlettroscoe · 2018-03-14T01:08:05Z

-DTpetra_INST_SERIAL:BOOL=ON. This fixed the error, at the cost of increasing the build time and probably any executables.

For now, can you enable option 2? Longer term, we may be able to relax this requirement in MueLu.

@jhux2,

I also confirmed that setting Tpetra_INST_SERIAL=ON fixed the build of the MueLu tests and examples. I will go ahead and push that updated ATDMDevEnv.cmake file.

This used to be ON by default, but that changed as part of Kokkos' refactor of CMake / Makefile options. Changing it back would increase the build time quite a bit.

@mhoemmen,

So the deal is that as long as Epetra is not enabled then you don't need these instantiations?

Note that we could add some specialized logic to the file Trilinos/cmake/ProjectCompilerPostConfigure.cmake that could turn on Tpetra_INST_SERIAL=ON when it detects that MueLu Tpetra and Epetra support are enabled. As shown at:

https://tribits.org/doc/TribitsDevelopersGuide.html#full-tribits-project-configuration

that file gets processed after the final set of enables and disables are determined and therefore, it would have all of the info needed to determine when this needed to be enabled. That would save users from having to figure this out on their own. Again, anything we can do to make Trilinos configure correctly for the requested user configuration will go a long way to reducing the reputation that Trilinos is hard to build (which I hear a lot).

So what are the right set of enable variables to look for in order to set Tpetra_INST_SERIAL=ON? This will result in this getting enabled only when it needs to be.

mhoemmen · 2018-03-14T18:05:28Z

@bartlettroscoe I don't want to hinder this process; I just want developers to be aware that enabling stuff that most users don't need or (shouldn't) use, might be easy to do, but increases build times and sizes. I don't want developers to get complacent about that. If we decide for now not to fix it, that's fine, but that needs to be a conscious choice.

mhoemmen · 2018-03-14T18:06:13Z

In summary: Please go ahead and do what you need to do, but be aware that we're building more than we need.

The enable of Tpetra_INST_SERIAL=ON may fix many of these build errors.

bartlettroscoe · 2018-03-14T19:34:43Z

In summary: Please go ahead and do what you need to do, but be aware that we're building more than we need.

I would rather build less. But given the option of having a build fail with link failures or building more than the user really needs (but building what they are actually asking for), I think we should error on the side of having the build succeeded.

Could problems like this be avoided if MueLu and other packages were better broken up into subpackages? For example, if Panzer only needs the Tpetra adapters from MueLu, and if MeuLu was broken up into subpackages MueLuCore, MueLuEpetra and MueLuTpetra, then if Panzer only defined a dependency on MueLuCore and MueLuTpetra, then the Epetra adapters would never get built and this problem would not exist.

tawiesn · 2018-03-14T20:05:03Z

@bartlettroscoe @jhux2

Could problems like this be avoided if MueLu and other packages were better broken up into subpackages?

No, this would not help. At least not for MueLu, since we already have a place for a clean distinction of the Epetra and Tpetra specific code with the Xpetra package. In contrast, it would break one of the core philosophies of MueLu: to be independent of the underlying linear algebra (either Epetra or Tpetra (+ Kokkos as option)). The problem is that people always start using Tpetra in MueLu directly (instead of using Xpetra). Xpetra is meant to deal with the guards and correct instantiations. The problem with Xpetra is, that it needs some more work to enable (or write stubs for) the fancy new features in Tpetra and not all developers are willing or able to invest that additional time and effort doing so. But it's crucial to understand that we do not want to break MueLu apart into two independent pieces (the Epetra and Tpetra part). MueLu provides implementations for rather general multigrid algorithms independent of Epetra and Tpetra.

tawiesn · 2018-03-14T20:15:27Z

@bartlettroscoe @jhux2
For example: i just found that comment in the source code of the ProjectorSmootherFactory:

// TAW: Oct 16 2015: subCopy is not part of Xpetra. One should either add it to Xpetra
// or replace this call by a local loop. I'm not motivated to do this now...

We only misuse Tpetra directly (instead of Xpetra) since in Xpetra we have no routine "subCopy", yet (which is available in Tpetra but not Epetra and therefore also not in Xpetra). The right solution would be to either avoid that function call (by doing it locally by hand) or add that functionality to Xpetra. Then we would not have such linker problems. It seems that this code has not been touched for more than two years. Obviously nobody was interested in that feature too much. Maybe one should just delete these algorithms or move them into a separate optional subpackage of all non-maintained code.

…s#2319, TRIL-171) This should fix the MueLu build failures with CUDA reported in trilinos#2319. I also removed setting Tpetra_INST_SERIAL=ON. This should cut down on the build times over setting Tpetra_INST_SERIAL=ON.

bartlettroscoe · 2018-03-15T17:56:21Z

I pushed the commit eee871d which sets MueLu_ENABLE_Epertra=OFF instead of Tpetra_INST_SERIAL=ON and it passed all of the builds, including all of the CUDA builds. This also cleared up all of the failing MueLu tests (except for one build and that looks to be a separate issue).

Also, using MueLu_ENABLE_Epertra=OFF instead of Tpetra_INST_SERIAL=ON cut the cumulative package-by-package build time from 3h9m43s to 2h53m65s so the savings is not insignificant.

@bathmatt, it is acceptable for EMPIRE if MueLu_ENABLE_Epertra=OFF is set? Does EMPIRE need Epetra support under MueLu?

DETAILS (click to expand)

Last night I pushed the commit eee871d:

commit eee871d803e2d0a60c60710071f920365687fdcb
Author: Roscoe A. Bartlett <[email protected]>
Date:   Wed Mar 14 17:34:01 2018 -0600

    Set MueLu_ENABLE_Epertra=OFF to fix MueLu CUDA link failures (#2319, TRIL-171)
    
    This should fix the MueLu build failures with CUDA reported in #2319.
    
    I also removed setting Tpetra_INST_SERIAL=ON.  This should cut down on the
    build times over setting Tpetra_INST_SERIAL=ON.

diff --git a/cmake/std/atdm/ATDMDevEnv.cmake b/cmake/std/atdm/ATDMDevEnv.cmake
index 39a8390..53ac0d8 100644
--- a/cmake/std/atdm/ATDMDevEnv.cmake
+++ b/cmake/std/atdm/ATDMDevEnv.cmake
@@ -123,12 +123,10 @@ ATDM_SET_CACHE(Kokkos_ENABLE_Debug_Bounds_Check "${ATDM_BOUNDS_CHECK}" CACHE BOO
 ATDM_SET_CACHE(KOKKOS_ARCH "$ENV{ATDM_CONFIG_KOKKOS_ARCH}" CACHE STRING)
 ATDM_SET_CACHE(EpetraExt_ENABLE_HDF5 OFF CACHE BOOL)
 ATDM_SET_CACHE(MueLu_ENABLE_Experimental ON CACHE BOOL)
+ATDM_SET_CACHE(MueLu_ENABLE_Epetra OFF CACHE BOOL)
 ATDM_SET_CACHE(Panzer_ENABLE_FADTYPE "Sacado::Fad::DFad<RealType>" CACHE STRING)
 ATDM_SET_CACHE(Phalanx_KOKKOS_DEVICE_TYPE "${ATDM_NODE_TYPE}" CACHE STRING)
 ATDM_SET_CACHE(Phalanx_SHOW_DEPRECATED_WARNINGS OFF CACHE BOOL)
-IF (ATDM_USE_CUDA)
-  ATDM_SET_CACHE(Tpetra_INST_SERIAL "${ATDM_USE_CUDA}" CACHE BOOL)
-ENDIF()
 ATDM_SET_CACHE(Tpetra_INST_CUDA "${ATDM_USE_CUDA}" CACHE BOOL)
 ATDM_SET_CACHE(Xpetra_ENABLE_Experimental ON CACHE BOOL)

This resulted in all passing builds for MueLu on all of the platforms, including all of the CUDA builds as shown at:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&date=2018-03-15&filtercombine=and&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=63&value1=-atdm-&field2=subprojects&compare2=93&value2=MueLu

The only test failures were for the build Trilinos-atdm-white-ride-cuda-opt on white and that looks to be assoicated with how the tests are being run and not a MueLu problem at this point. (I will create another GitHub issue to look into that problem.)

Note that all of the Panzer tests on CUDA passed as well as shown at:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&date=2018-03-15&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=buildname&compare1=63&value1=-atdm-&field2=subprojects&compare2=93&value2=Panzer&field3=buildname&compare3=63&value3=-cuda-

Therefore, disabling Epetra support in MueLu does not seem to impact Panzer tests at all (there have been 116 Panzer tests run for these CUDA builds for the last several days). Therefore, hopefully this would be okay for EMPIRE?

It is also interesting to see the impact this has on the build times for the MueLu build wtih CUDA. Looking at the build Trilinos-atdm-hansen-shiller-cuda-opt on shiller over the last week at:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&date=2018-03-15&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=site&compare1=63&value1=hansen&field2=buildname&compare2=61&value2=Trilinos-atdm-hansen-shiller-cuda-opt&field3=buildstarttime&compare3=84&value3=now

The build two days ago on 2018-03-13 was failing and it took 2h50m17s. Then yesterday, the build passed using Tpetra_INST_SERIAL=ON and it took 3h9m43s. And then today using MueLu_ENABLE_Epetra=OFF (and Tpetra_INST_SERIAL=OFF impicitly set), it took 2h53m65s. Therefore, we can see the approach of using MueLu_ENABLE_Epetra=OFF vs. Tpetra_INST_SERIAL=ON looks to have shaved off about 16m out of a build that takes about 3 hours (using a package-by-package build so build times are a bit inflated). But that is not too bad. And you can't really compare the build time to the case where the build failed because there was a bunchy of executbles that aborted their link because they were missing link symbols.

This addresses all of the MueLu test and example build failures reported in

…s#2319, TRIL-171) This should fix the MueLu build failures with CUDA reported in trilinos#2319. I also removed setting Tpetra_INST_SERIAL=ON. This should cut down on the build times over setting Tpetra_INST_SERIAL=ON.

Turns out that some of these Panzer examples test behavior that EMPIRE needs. Therefore, we need to get them working. Therefore, to help get these fixed, and since the rest of the cuda build is not clean yet, we need to turn these back on. Note that this should not enable the Panzer examples for the special "-panzer" builds since the CTest -S driver script will explicitly disable the Panzer examples for those builds. Build and test results on 'shiller' show below (the build passes but there are still some failing test). These builds are not promoted to the "ATDM" Group/Track yet so this will not spam any one with CDash error emails. Enabled Packages: Panzer Build test results: ------------------- 1) cuda-opt => FAILED: passed=150,notpassed=3 => Not ready to push! (55.40 min) 2) cuda-debug => FAILED: passed=151,notpassed=2 => Not ready to push! (66.06 min)

Turns out that some of these Panzer examples test behavior that EMPIRE needs. Therefore, we need to get them working. Therefore, to help get these fixed, and since the rest of the cuda build is not clean yet, we need to turn these back on. Note that this should not enable the Panzer examples for the special "-panzer" builds since the CTest -S driver script will explicitly disable the Panzer examples for those builds. Build and test results on 'shiller' show below (the build passes but there are still some failing test). These builds are not promoted to the "ATDM" Group/Track yet so this will not spam any one with CDash error emails.

bartlettroscoe · 2018-03-25T03:05:18Z

I talked with @bathmatt yesterday and he confirmed that EMPIRE does not need Epetra support under MueLu. Therefore, this is resolved and I am closing this as completed.

Set MueLu_ENABLE_Epetra=OFF See issue trilinos#2319 for discussion regarding this setting.

…nos#2674, trilinos#2319) The CUDA bulid for MueLu was fixed so this disable should not be needed anymore.

…lu-disable-epetra Automatically Merged using Trilinos Pull Request AutoTester PR Title: Remove MueLu_ENABLE_Epetra=OFF for EMPIRE ATDM Trilinos config (#2674, #2319) PR Author: bartlettroscoe

…nos#2674, trilinos#2319) The CUDA bulid for MueLu was fixed so this disable should not be needed anymore.

bartlettroscoe added type: bug The primary issue is a bug in Trilinos code or tests pkg: NOX client: ATDM Any issue primarily impacting the ATDM project labels Mar 2, 2018

bartlettroscoe added this to the Initial cleanup of new ATDM builds of Trilinos milestone Mar 2, 2018

bartlettroscoe added pkg: MueLu and removed pkg: NOX labels Mar 2, 2018

mayrmt mentioned this issue Mar 3, 2018

Fix #2319: MueLu research build #2326

Merged

bartlettroscoe assigned mayrmt Mar 3, 2018

bartlettroscoe added the stage: in progress Work on the issue has started label Mar 3, 2018

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 14, 2018

WIP: Re-enable Panzer examples (trilinos#2313, trilinos#2319)

1d7b995

The enable of Tpetra_INST_SERIAL=ON may fix many of these build errors.

bartlettroscoe added stage: in review Primary work is completed and now is just waiting for human review and/or test feedback and removed stage: in progress Work on the issue has started labels Mar 15, 2018

jhux2 mentioned this issue Mar 16, 2018

Building Trilinos on Power 8+ CUDA environment #2392

Closed

kyungjoo-kim pushed a commit to kyungjoo-kim/Trilinos that referenced this issue Mar 16, 2018

Set Tpetra_INST_SERIAL=ON for CUDA builds (trilinos#2319, TRIL-171)

4ac4ee5

This addresses all of the MueLu test and example build failures reported in

bartlettroscoe added type: bug The primary issue is a bug in Trilinos code or tests and removed type: bug The primary issue is a bug in Trilinos code or tests labels Mar 20, 2018

bartlettroscoe closed this as completed Mar 25, 2018

bartlettroscoe removed the stage: in review Primary work is completed and now is just waiting for human review and/or test feedback label Mar 26, 2018

ndellingwood added a commit to ndellingwood/Trilinos that referenced this issue May 8, 2018

Update Kokkos integration testing script

93bd9ef

Set MueLu_ENABLE_Epetra=OFF See issue trilinos#2319 for discussion regarding this setting.

ndellingwood mentioned this issue May 8, 2018

Update Kokkos integration testing script #2700

Merged

ndellingwood mentioned this issue May 21, 2018

TrilinosCouplings build failure #2786

Closed

bartlettroscoe mentioned this issue Jul 31, 2018

New PanzerMiniEM test failures in ATDM Trilinos builds starting 7/21/2018 #3182

Closed

bartlettroscoe mentioned this issue Oct 18, 2018

Add MueLu "Refactor" enables to auto PR and ATDM Trilinos builds #2674

Closed

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Oct 24, 2018

Remove MueLu_ENABLE_Epetra=OFF for EMPIRE ATDM Trilinos config (trili…

de9bdc6

…nos#2674, trilinos#2319) The CUDA bulid for MueLu was fixed so this disable should not be needed anymore.

bartlettroscoe mentioned this issue Oct 24, 2018

Remove MueLu_ENABLE_Epetra=OFF for EMPIRE ATDM Trilinos config (#2674, #2319) #3723

Merged

1 task

bartlettroscoe added the PA: Linear Solvers Issues that fall under the Trilinos Linear Solvers Product Area label Nov 30, 2018

tjfulle pushed a commit to tjfulle/Trilinos that referenced this issue Dec 6, 2018

Remove MueLu_ENABLE_Epetra=OFF for EMPIRE ATDM Trilinos config (trili…

b8f5f75

…nos#2674, trilinos#2319) The CUDA bulid for MueLu was fixed so this disable should not be needed anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MueLu research and example code build failures for new CUDA ATDM build on hansen/shiller #2319

MueLu research and example code build failures for new CUDA ATDM build on hansen/shiller #2319

bartlettroscoe commented Mar 2, 2018 •

edited

Loading

jhux2 commented Mar 2, 2018

bartlettroscoe commented Mar 2, 2018

mayrmt commented Mar 2, 2018

mayrmt commented Mar 2, 2018

bartlettroscoe commented Mar 2, 2018

mayrmt commented Mar 3, 2018 •

edited

Loading

bartlettroscoe commented Mar 3, 2018

bartlettroscoe commented Mar 5, 2018

bartlettroscoe commented Mar 5, 2018

mhoemmen commented Mar 5, 2018

jhux2 commented Mar 5, 2018

bartlettroscoe commented Mar 6, 2018

bartlettroscoe commented Mar 13, 2018

jhux2 commented Mar 13, 2018

bartlettroscoe commented Mar 13, 2018

jhux2 commented Mar 13, 2018

bartlettroscoe commented Mar 13, 2018

jhux2 commented Mar 13, 2018

jhux2 commented Mar 14, 2018

mhoemmen commented Mar 14, 2018

bartlettroscoe commented Mar 14, 2018

mhoemmen commented Mar 14, 2018

mhoemmen commented Mar 14, 2018

bartlettroscoe commented Mar 14, 2018

tawiesn commented Mar 14, 2018 •

edited

Loading

tawiesn commented Mar 14, 2018

bartlettroscoe commented Mar 15, 2018 •

edited

Loading

bartlettroscoe commented Mar 25, 2018

MueLu research and example code build failures for new CUDA ATDM build on hansen/shiller #2319

MueLu research and example code build failures for new CUDA ATDM build on hansen/shiller #2319

Comments

bartlettroscoe commented Mar 2, 2018 • edited Loading

Next Action Status:

Description

Steps to Reproduce:

jhux2 commented Mar 2, 2018

bartlettroscoe commented Mar 2, 2018

mayrmt commented Mar 2, 2018

mayrmt commented Mar 2, 2018

bartlettroscoe commented Mar 2, 2018

mayrmt commented Mar 3, 2018 • edited Loading

bartlettroscoe commented Mar 3, 2018

bartlettroscoe commented Mar 5, 2018

bartlettroscoe commented Mar 5, 2018

mhoemmen commented Mar 5, 2018

jhux2 commented Mar 5, 2018

bartlettroscoe commented Mar 6, 2018

bartlettroscoe commented Mar 13, 2018

jhux2 commented Mar 13, 2018

bartlettroscoe commented Mar 13, 2018

jhux2 commented Mar 13, 2018

bartlettroscoe commented Mar 13, 2018

jhux2 commented Mar 13, 2018

jhux2 commented Mar 14, 2018

mhoemmen commented Mar 14, 2018

bartlettroscoe commented Mar 14, 2018

mhoemmen commented Mar 14, 2018

mhoemmen commented Mar 14, 2018

bartlettroscoe commented Mar 14, 2018

tawiesn commented Mar 14, 2018 • edited Loading

tawiesn commented Mar 14, 2018

bartlettroscoe commented Mar 15, 2018 • edited Loading

bartlettroscoe commented Mar 25, 2018

bartlettroscoe commented Mar 2, 2018 •

edited

Loading

mayrmt commented Mar 3, 2018 •

edited

Loading

tawiesn commented Mar 14, 2018 •

edited

Loading

bartlettroscoe commented Mar 15, 2018 •

edited

Loading