-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Albany configure errors due to TriBITS update on 11/25/2021 #9972
Comments
Hi Irina, the last stk update in Trilinos was Nov 5: #9896 Stefan Domino is seeing a cmake issue also for nalu, I'll keep you posted if I figure out anything. |
These missing targets all seem related to the above PR #9894. However, the discussion on that PR was really long. Can we revert this commit, or do we want to make sure that we have a clear bisect first? |
@bartlettroscoe, above sounds like a great approach. Let me know when the change is reverted and I can launch a re-test. Let me know offline if you have any issues with the Nalu build process (or add a Nalu issue - your choice). I would work for a gcc 8.4 open-mpi 4.0.5 build. It should be clear sailing. |
I'm happy to retest Albany manually as well - just let me know. |
…/tribits-299-modern-cmake-targets-1" This reverts commit db3205b, reversing changes made to 110b6c4 which reverts the PR merge trilinos#9894. This is to allow reproducing and addressing the problems described in the new issues trilinos#9972 and #trilinos#9973 offline to allow the Albany and Nalu Trilinos integration process, respectively, to continue working in the meantime.
FYI: I posted the revert of PR #9894 in the new PR #9977. I just need someone to approve that PR so that it can merge. Hopefully the PR can pass testing fast and merge (but I don't have any control over that). @ikalash, can you point me to detailed instructions on how to reproduce this failure with Albany + Trilinos? |
@bartlettroscoe : great! I can approve if you add me as a reviewer. Yes, I can provide instructions for Albany. What is your machine of choice? CEE? Blake? |
@ikalash, basic CEE RHEL7 machines would be best for me. |
Here are instructions for CEE:
|
…Trilinos/tribits-299-modern-cmake-targets-1"" This reverts commit fd27a20. This gets us back to the state of the 'develop' branch after the PR trilinos#9894 that merged the branch 'tribits-299-modern-cmake-targets-1' was merged (as well as other PRs in the days after that). Now I can try to reproduce the errors in issues trilinos#9972 and trilinos#9973.
…299-modern-cmake-targets-1 Automatically Merged using Trilinos Pull Request AutoTester PR Title: Revert: TriBITS: Pull in partial refactoring to modern CMake targets (TriBITSPub/TriBITS#299) (#9894, #9972, #9973) PR Author: bartlettroscoe
The revert resolved this - closing. |
Reopening so we can track the reproduction of the errors here that will be fixed as part of PR #9978 ... Hello @ikalash, I was not able to reproduce on a CEE RHEL7 machine. The configure of Trilinos fails. It can't find BLAS. Specifically, what CEE machines do you use to do these builds as per above? After setting up the repos with:
Following the instructions above:
That failed with the following configure error:
Note that my CEE RHEL7 machine does not have libblas.so or libblas.a installed. |
Weird about blas. I wonder if the paths are different on some of the compute nodes. Can you please try cee-compute020? That's where our CEE intel nightly build runs. |
Pretty much every other CEE RHEL7 machine seems to have libblas. Seems they just broke my one machine the last time they upgraded it I guess. (For some reason, BLAS and LAPACK are not part of the standard upgrades to systems.) I have got the configure of Trilinos to pass so now I am on my way ... |
@ikalash, I have been able to reproduce the Albany configure error:
with details below. I will get the bottom of what is happening. Thanks! Reproduction Details. I am trying this again on my CEE EWS machine 'ews00232' after copying the above git repos:
That produced the error:
The script /scratch/rabartl/Albany.base/Albany/build-sems-intel/do-configure was:
|
…age includes (TriBITSPub#299) This is the use case that triggers trilinos/Trilinos#9972 and trilinos/Trilinos#9973. Now I will change the code to fix the test.
…age includes (TriBITSPub#299) This sets <ParentPackage>_ENABLE_<SubPackage>=ON if the subpackage is enabled even if optional packages are disabled. This will fix trilinos/Trilinos#9972 and trilinos/Trilinos#9973.
…age includes (TriBITSPub#299) This is the use case that triggers trilinos/Trilinos#9972 and trilinos/Trilinos#9973. Now I will change the code to fix the test.
…age includes (TriBITSPub#299) This sets <ParentPackage>_ENABLE_<SubPackage>=ON if the subpackage is enabled even if optional packages are disabled. This will fix trilinos/Trilinos#9972 and trilinos/Trilinos#9973. This also updates the logic that generates <Package>Config.cmake files to only include <UpstreamPackage>Config.cmake files for direct dependencies, not all dependencies. (The indirect includes should take care of the rest.)
@ikalash, to be more specific ... Do you have a simple setup to support configuration, building, and installation of Trilinos for Albany locally, running the native Trilinos tests for the enabled packages used by Albany, and then configuring, building, and testing Albany against that local Trilinos install? This should be just a few commands like for Trilinos that looks something like:
and then configure and build Albany against that Trilinos install like:
where the files This is really what I (and other people) need to be able to create a working baseline Albany (or any application code) to test against Trilinos. And if we can do this on a standard CEE RHEL7 machine, then that makes it easy for anyone to do such reproductions since everyone should access to a CEE LAN machine. (They stood up the new HPWS machines which looks to be pretty good so far.) If you have this (and someone maintains this working at all times), then that eliminates overhead of testing an APP against a local Trilinos git repo. I can perhaps help to set this up if you are interested and get this under version control so that it will be kept up-to-date. |
…n a parent package (TriBITSPub#299) This is the use case exercised by Albany/Trilinos in trilinos/Trilinos#9972 (and likely also Nalu/Trilinos in trilinos/Trilinos#9973). This test shows the same configure error where <Project>Config.cmake is trying to include an <SubPackage>Config.cmake file for a required subpackage that is not actually enabled.
…reamPackage> (TriBITSPub#299) This is needed in cases where a top-level package has multiple required subpackages but where the user only requests a subset of the required subpackages be enabled and not the top-level package itself. This is one of the use cases exersized by Albany/Trilinos (see trilinos/Trilinos#9972). This commit fixes the failing test TribitsExampleApp_EnableSingleSubpackage.
…Trilinos/tribits-299-modern-cmake-targets-1"" This reverts commit fd27a20. This gets us back to the state of the 'develop' branch after the PR trilinos#9894 that merged the branch 'tribits-299-modern-cmake-targets-1' was merged (as well as other PRs in the days after that). Now I can try to reproduce the errors in issues trilinos#9972 and trilinos#9973.
…targets-1-again (TriBITSPub/TriBITS#433) Should address all of the issues with the merge of PR trilinos#9894 listed out in TriBITSPub/TriBITS#433 (which is part of TriBITSPub/TriBITS#299). This should resolve the failures reported in trilinos#9972 and trilinos#9973.
FYI: So far, I have been unable to get a working reference build of Trilinos+Albany. But I have gotten through the configure of Albany and I believe the remaining issues are unrelated to Trilinos. But if someone on the Albany team would like to test the next update of TriBITS to Trilinos in PR #9978, they can access the tip of the branch in their local Trilinos repo as:
Otherwise, I am inclined to merge PR #9978 and see how it goes. |
I will test it now. |
I'm getting the following error when building Albany:
Were changes to Teuchos made? I did not build Albany from scratch so perhaps this is an artifact of that and not due to these changes? |
Yes we use the develop branch. The Albany nightly build on the same machine with develop was clean last night. |
@ikalash, what machine are you doing this on? (You can contact my offline if that is a sensitive question.) |
I tested your branch on cee-compute021 with a gcc compiler. |
@ikalash, I have been unable to get a reference configuration for Trilinos and Albany to work on a cee-buildxyz machine. I tried following the instructions provided but I am getting an Albany build error:
It seems that Albany is expecting that Zoltan2 is enabled and installed but that is not the case with the configure script that I was given. (My guess is that the Albany Trilinos configuration is expecting the SCOREC package to be present and enabled which triggers the enable of Zoltan2 but I don't know where to get that package from a repo and if so what version to use.) I can give exact command-by-command reproducibility instructions with some scripts I have checked into an internal repo (albany_trilinos_build_scripts). Then perhaps we can update those instructions and scripts to allow for easy reproducability? Should we open an internal GitLab or JIRA issue to track this? If I can't reproduce a reference build of Albany + Trilinos then I can't debug anything related to Albany. |
I can't access the repo you linked. Are you using these scripts? https://github.com/sandialabs/Albany/blob/master/doc/dashboards/cee-compute011.sandia.gov/sems-intel-modules.sh SCOREC is not required for Albany, so that is not the issue. I have an idea of what is the problem actually if you're using the right scripts. What compute-node are you running on? I've seen weird Albany errors on some of the cee nodes/machines. Could you please try building on cee-compute020, where the nightlies run? |
@ikalash, I (hopefully) opened up the repo and provided full reproducibility instructions in: Please give that a try on any CEE RHEL7 machine. (I used 'cee-build030' which is a new super fast machine.) If you see anything that needs improved in those scripts, please post a Merge Request against 'master' in the repo: I will post an issue there with the details of my reproducibility attempt and what I am seeing. |
I think cee-build030 was one of the machines where other users have seen build issues. Could you please try on cee-compute020 just as a sanity check? I can try your scripts as well. |
@bartlettroscoe : I just tried your scripts on cee-compute030. Here is what I found:
Could you please verify that you see the same behavior? I'm not sure what is broken in the ninja build case... I don't typically use it. |
@ikalash, what exact versions of Trilinos and Albany did you use (i.e. the exact SHA1s) and what test results did you get for Albany when you ran the Albany test suite? |
Sorry about the delay - one of my projects has a review next week and I've been busy preparing for that. Here are the SHAs: Trilinos: 7b1fc5a It turns out on cee-compute030, the code does not run. Here is the error:
I haven't seen this before but when I googled it, it looks like it came up in Trilinos awhile back. I repeated the build on cee-compute021, and there the tests pass. This is consistent w/ what I remember some Albany customers reported awhile back (the code did not build/run correctly on some of the newer cee-compute nodes). |
@ikalash, what specific command did you run and how many tests ran and passed? One way to get a nice summary of tests run with ctest is:
That will print the summary of the number of tests run, tests passed, tests failed, and list the tests that failed. |
@ikalash, what do you mean by:
? What Using the instructions at: but instead switching to configuring Albany with Makefiles instead of Ninja with:
seems to made no difference for me. I still got the Albany build error:
As I said above, I think the problem may be due to the SCOREC package not being present when Trilinos is configured, built, and installed (which would therefore enable Zoltan2). Can you please create an issue in the internal repo: that gives exact unambiguous reproducibility instructions? Otherwise, can we briefly pair program together so I can examine your source and build directories to see exactly what is going on? If I can just get a single successful reference build of Albany + Trilinos 'develop' working, then I should be off to the races. Otherwise, it will be impossible for me to debug any issues with future updates of TriBITS against Albany. |
@ikalash, after fixing the typo on the Trilinos configure script we found, I now have a successful reference build and test of Albany + Trilinos on 'cee-compute021' as described in complete detail in https://cee-gitlab.sandia.gov/rabartl/albany_trilinos_build_scripts/-/issues/1#note_2115651 that produced the Albany test results:
I should now be able to debug problems with the updated Trilinos build system in PR #9978. |
@bartlettroscoe great! Let me know if you have any more questions about Albany. |
…eterList This Albany CMakeLists.txt was making assumptions about the implentation details of the TriBITS-generated <Package>Config.cmake files that it should not have been making. It was assuming that the raw TriBITS target 'teuchosparameterlist' existed which is a no-no. The correct old-school TriBITS usage is ${TeuchosParameterList_LIBRARIES} and the associated include directories. This works with old TriBITS and refactored TriBITS. This is related to trilinos/Trilinos#9972 and PR trilinos/Trilinos#9978.
…Trilinos/tribits-299-modern-cmake-targets-1"" This reverts commit fd27a20. This gets us back to the state of the 'develop' branch after the PR trilinos#9894 that merged the branch 'tribits-299-modern-cmake-targets-1' was merged (as well as other PRs in the days after that). Now I can try to reproduce the errors in issues trilinos#9972 and trilinos#9973.
We're getting failures in our nightly tests for Albany due to missing targets in STK:
https://sems-cdash-son.sandia.gov/cdash/build/25764/configure
Is this due to recent changes to Trilinos?
The text was updated successfully, but these errors were encountered: