Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TriBits: undefined variable breaks nightly builds #4796

Closed
lucbv opened this issue Apr 2, 2019 · 23 comments
Closed

TriBits: undefined variable breaks nightly builds #4796

lucbv opened this issue Apr 2, 2019 · 23 comments
Assignees
Labels
stage: in review Primary work is completed and now is just waiting for human review and/or test feedback TriBITS Issues with the TriBITS framework itself, not usage of the TriBITS framework type: bug The primary issue is a bug in Trilinos code or tests

Comments

@lucbv
Copy link
Contributor

lucbv commented Apr 2, 2019

@bartlettroscoe (there is not good team for that type of issues...)

Expectations

triBits changes should not break current nightly builds.

Current Behavior

The nightly builds from MueLu are all failing at configure time due to an undefined variable: ${${PROJECT_NAME}_TRIBITS_DIR} in file cmake/tribits/core/utils/MessageWrapper.cmake at line 45.

Motivation and Context

This has taken down all the nightly MueLu builds which means that we cannot detect bugs in our specialize and experimental tracks that usually are not tested by ATDM or Continuous builds.

Possible Solution

Is seems that changes done last week in triBits are to blame, see commit 2283e955

Steps to Reproduce

Attempting to run any build using the cmake/ctest/drivers/{enigma,geminga,rocketman,trappist} will fail.

@lucbv lucbv added type: bug The primary issue is a bug in Trilinos code or tests TriBITS Issues with the TriBITS framework itself, not usage of the TriBITS framework labels Apr 2, 2019
@bartlettroscoe
Copy link
Member

@lucbv, can you give more context? What is the actual error message with the stack track?

@lucbv
Copy link
Contributor Author

lucbv commented Apr 3, 2019

@bartlettroscoe here is the log we get from the nightly builds (I only posted the relevant part but I could mail you the log if it helps).
As explained above the issue is related to

CMake Error at /storage/lberge/nightlyTests/Trilinos/cmake/tribits/core/utils/MessageWrapper.cmake:45 (INCLUDE):
  INCLUDE could not find load file:

The part of the log where the error occurs

C) Configure /storage/lberge/nightlyTests/TDD_BUILD ...
SetCTestConfiguration:BuildDirectory:/storage/lberge/nightlyTests/TDD_BUILD
SetCTestConfiguration:SourceDirectory:/storage/lberge/nightlyTests/Trilinos/cmake/ctest/drivers
SetCTestConfiguration:ConfigureCommand:"/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.10.3/bin/cmake" "-GUnix Makefiles" "/storage/lberge/nightlyTests/Trilinos/cmake/ctest/drivers"
Configure project
Configure with command: "/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.10.3/bin/cmake" "-GUnix Makefiles" "/storage/lberge/nightlyTests/Trilinos/cmake/ctest/drivers"
Run command: "/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.10.3/bin/cmake" "-GUnix Makefiles" "/storage/lberge/nightlyTests/Trilinos/cmake/ctest/drivers"
-- TDD_FORCE_INNER_CMAKE_INSTALL='1'
-- ENV_TRIBITS_TDD_USE_SYSTEM_CTEST='1'
-- TRIBITS_TDD_USE_SYSTEM_CTEST='1'
CMake Error at /storage/lberge/nightlyTests/Trilinos/cmake/tribits/core/utils/MessageWrapper.cmake:45 (INCLUDE):
  INCLUDE could not find load file:

    /core/utils/GlobalSet.cmake
Call Stack (most recent call first):
  /storage/lberge/nightlyTests/Trilinos/cmake/tribits/core/package_arch/TribitsGeneralMacros.cmake:42 (INCLUDE)
  /storage/lberge/nightlyTests/Trilinos/cmake/tribits/core/package_arch/TribitsConfigureCTestCustom.cmake:40 (INCLUDE)
  /storage/lberge/nightlyTests/Trilinos/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake:76 (include)
  CMakeLists.txt:19 (include)


site='trappist.sandia.gov'
site='trappist.sandia.gov' MATCHES directory name dir='trappist'
-- TDD_DRIVER_SUBDIRECTORY='trappist'
TDD_DRIVER_SUBDIRECTORY='trappist'
TRIBITS_DRIVER_ADD_DASHBOARD:  'CLANG_OPENMPI_1.10.0_RELEASE'  'ctest_linux_nightly_mpi_release_muelu_trappist.clang.cmake' [CTEST_INSTALLER_TYPE;release;RUN_SERIAL;TIMEOUT_MINUTES;330]
-- Skipping CMake install tests because TRIBITS_TDD_USE_SYSTEM_CTEST==1
-- Configuring incomplete, errors occurred!
See also "/storage/lberge/nightlyTests/TDD_BUILD/CMakeFiles/CMakeOutput.log".
Command exited with the value: 1
Error(s) when configuring the project
 Add coverage exclude regular expressions.
SetCTestConfiguration:CMakeCommand:/projects/sems/install/rhel7-x86_64/sems/utility/cmake/3.10.3/bin/cmake

@bartlettroscoe
Copy link
Member

@lucbv, let me see if I can figure out what is going on with this.

NOTE: This system was written by a contractor and never had any automated tests so it has been very hard to support because of this and other reasons. See:

We don't use it for the ATDM Trilinos builds.

@jhux2
Copy link
Member

jhux2 commented Apr 9, 2019

Btw, setting TRIBITS_PROJECT_ROOT in the crontab environment doesn't resolve this issue. It would appear the variable value isn't propagating. I can see from a log that it's at least initially set correctly:


Starting nightly Trilinos development testing on rocketman: Tue Apr  9 10:58:02 PDT 2019

Configuration = default
SEMS_GCC_LOCAL_PYTHON_VERSION=2.6.6
MANPATH=/projects/sems/install/rhel6-x86_64/sems/compiler/python/2.7.9/share/man:/projects/sems/install/rhel6-x86_64/sems/compiler/python/2.7.9/man:/projects/sems/install/rhel6-x86_64/sems/tpl/netcdf/4.4.1/gcc/5.3.0/openmpi/1.10.1/exo_parallel/share/man:/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/5.3.0/openmpi/1.10.1/share/man:/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/4.4.7/openmpi/1.10.1/share/man:/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/5.3.0/base/share/man:/projects/sems/install/rhel6-x86_64/sems/utility/cmake/3.10.3/share/man:/projects/sems/install/rhel6-x86_64/sems/utility/cmake/3.10.3/man:/usr/local/share/man
TDD_HTTP_PROXY=http://sonproxy.sandia.gov:80
TRIBITS_TDD_USE_SYSTEM_CTEST=1
SEMS_NETCDF_LIBRARY_PATH=/projects/sems/install/rhel6-x86_64/sems/tpl/netcdf/4.4.1/gcc/5.3.0/openmpi/1.10.1/exo_parallel/lib
SEMS_MPI_NAME=openmpi
SEMS_SUPERLU_INCLUDE_PATH=/projects/sems/install/rhel6-x86_64/sems/tpl/superlu/4.3/gcc/5.3.0/base/include
SEMS_OPENMPI_INCLUDE_PATH=/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/5.3.0/openmpi/1.10.1/include
MPICC=mpicc
MATLABPATH=/home/jhu/software/matlab/utilities
SHELL=/bin/bash
SEMS_SUPERLU_LOCAL_PYTHON_VERSION=2.6.6
SEMS_OPENMPI_LOCAL_PYTHON_VERSION=2.6.6
CTEST_CONFIGURATION=default
TDD_FORCE_CMAKE_INSTALL=0
Trilinos_TRIBITS_DIR=/home/nightlyTesting/trilinos

@bartlettroscoe
Copy link
Member

@jhux2, the problem is the the vars PROJECT_NAME and/or ${PROJECT_NAME}_TRIBITS_DIR are not getting set correctly in TribitsDriverCMakeLists.cmake (because they must not have been needed before).

Can you try the patch shown below and see if that fixes this?


diff --git a/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake b/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake
index 79fe491..29ca940 100644
--- a/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake
+++ b/cmake/tribits/dashboard_driver/TribitsDriverCMakeLists.cmake
@@ -61,6 +61,9 @@ IF (NOT TRIBITS_ROOT)
 ENDIF()
 get_filename_component(TRIBITS_ROOT "${TRIBITS_ROOT}" ABSOLUTE)
 
+set(PROJECT_NAME DummyProject)
+set(${PROJECT_NAME}_TRIBITS_DIR "${TRIBITS_ROOT}")
+
 set(CMAKE_MODULE_PATH
   ${CMAKE_CURRENT_LIST_DIR}
   ${TRIBITS_ROOT}/core/utils

Might just have to bite the bullet and start writing some automated tests for this sticking dashboard driver system (written by a contractor years ago who did not write any automated tests for this).

@jhux2
Copy link
Member

jhux2 commented Apr 9, 2019

@bartlettroscoe I tried your suggestion, but get the same error.

@bartlettroscoe
Copy link
Member

@jhux2 said:

@bartlettroscoe I tried your suggestion, but get the same error.

Okay, I will revert the changes to those files and see if we can get this working. I will try to set up a manual testing scenario to see if this will fix the problem.

@lucbv
Copy link
Contributor Author

lucbv commented Apr 9, 2019

@jhux2, another option is to use the same logic as ATDM to run our nightly tests. I am attempting to setup such a build for trappist. This is still a work in progress but if you look at the dashboard you can see that I have a test build that was able to post results in the nightly track. Now I only need to have it actually test something...

@bartlettroscoe
Copy link
Member

@lucbv said:

another option is to use the same logic as ATDM to run our nightly tests

If that would not be too much trouble, that would be my advice. It is pretty simple. Just clone an "outer" Trilinos and then set up SRC_AND_BUILD and allow the ctest -S script.cmake run in there. The big disadvantage is that you will not see results on a CDash site, only in log files on the machine where you run the scripts. Just make sure you update that "outer" Trilinos before running the individual builds.

Now that we have ninja and since configuration is pretty fast (unless you have a mounted disk) there is no advantage to running more than one build at a time so a simple loop over your builds does the trick.

But I will still fix this for other builds out there.

@lucbv
Copy link
Contributor Author

lucbv commented Apr 9, 2019

@bartlettroscoe, I can see results on testing.sandia.gov as you can see my builds are posting in the nightly track. I am also pretty sure that we can all see the results in the ATDM track, so could you clarify what you mean here:

The big disadvantage is that you will not see results on a CDash site, only in log files on the machine where you run the scripts.

@bartlettroscoe
Copy link
Member

@lucbv asked:

so could you clarify what you mean here?

I mean you can't see the STDOUT output from the ctest -S <script>.cmake invocation. Usually you don't need to see that if everything is going well but if things don't go well then you will need to see that to fix problems if they occur.

@lucbv
Copy link
Contributor Author

lucbv commented Apr 9, 2019

@bartlettroscoe that's fine with me, I do hope to set things up once and for all and then only have to touch up sporadically. As long as I get the results from my nightly builds correctly on the dashboard that will be OK.
At the moment it does not seem to build correctly the packages that are enabled, I am not sure why...

@bartlettroscoe
Copy link
Member

@lucbv said:

At the moment it does not seem to build correctly the packages that are enabled, I am not sure why...

Would this documentation help:

?

Otherwise, I will be posting a PR with a fix for the old deprecated TriBITS Dashboard Driver system shortly.

@bartlettroscoe
Copy link
Member

FYI: PR #4859 should fix this. Please approve the PR.

Sorry this took me so long to get to this. Given the Trilinos PR builds and the ATDM Trilinos builds, hopefully there were not too many holes in testing in this time.

@bartlettroscoe
Copy link
Member

Sorry, the commit TriBITSPub/TriBITS@e155f5d closed this issue when it should not have. Re-opening.

@bartlettroscoe
Copy link
Member

FYI: The PR #4859 that should fix this was just merged.

Just a little history here. The commit that broke this was merged way back on 3/28/2019 as part of PR #4750. But no one seemed to notice that results on CDash were missing until 5 days later on 4/2/2019 when this Issue was created and someone else did not notice this until 4/4/2019 (7 days after 3/28/2019) when duplicate #4809 was created. That suggests that these results on CDash are not really being looked after very carefully.

If you guys are interested, I can show you how to set up the tool cdash_analayze_and_report.py (still being developed but working pretty well for ATDM) so that you would now the day after if results go missing on CDash. Let me know.

Putting this in review to see that results show up starting tomorrow.

@bartlettroscoe bartlettroscoe added the stage: in review Primary work is completed and now is just waiting for human review and/or test feedback label Apr 11, 2019
@lucbv
Copy link
Contributor Author

lucbv commented Apr 11, 2019

@bartlettroscoe said:

That suggests that these results on CDash are not really being looked after very carefully.

sorry for being at a conference while the #4750 was pushed, I would have complained earlier otherwise!

@lucbv
Copy link
Contributor Author

lucbv commented Apr 11, 2019

@bartlettroscoe for info, unless I am away, these builds are looked at every morning, hence me catching them and filing the issue the day I got back...

@bartlettroscoe
Copy link
Member

@lucbv said:

sorry for being at a conference while the #4750 was pushed, I would have complained earlier otherwise!

Understood. Are you the only person who looks at these builds? Are you interested in getting a summary email once a day for the builds you care about?

@jhux2
Copy link
Member

jhux2 commented Apr 11, 2019

But no one seemed to notice that results on CDash were missing until 5 days later on 4/2/2019 when this Issue was created and someone else did not notice this until 4/4/2019 (7 days after 3/28/2019) when duplicate #4809 was created. That suggests that these results on CDash are not really being looked after very carefully.

As @lucbv noted, this was just bad timing -- lots of MueLu develops on travel or vacation. Btw, 3/28 is a Thursday. Even in a normal week, this might not have been flagged until the next Monday.

@bartlettroscoe
Copy link
Member

@lucbv and @jhux2,

Looking at full Trilinos dashboard yesterday and compare it to the builds from 2019-04-02 it looks like all of the various non-ATDM builds are posting again so I will assume that my PR #4859 fixed this.

Can we close this?

@jhux2
Copy link
Member

jhux2 commented Apr 18, 2019

Thanks for fixing this. It's fine with me to close this issue.

@bartlettroscoe
Copy link
Member

Sorry this slipped through. May ha e to find a way set up automated tests for that system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage: in review Primary work is completed and now is just waiting for human review and/or test feedback TriBITS Issues with the TriBITS framework itself, not usage of the TriBITS framework type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

3 participants