-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undefined reference error coming from stk/boost in nightly Trilinos clang build on CEE for Albany #4676
Comments
Hi Irina, Sorry for the trouble! Anyway, I think I see an issue in the cmake file. It looks like stk_balance lists boost as an optional dependency, instead of a required dependency. I'll get a change in to fix that, maybe that will help. |
@alanw0 : I think it enabled automatically. We have not changed our nightly build scripts on the CEE in awhile so I don't believe it is intentional. It is interesting that the issue only happens in the clang build. We do use different builds of boost for different compilers. If you could fix it, that would be great. I forgot to mention that we are building against master Trilinos now in our nightlies instead of develop. I guess I'll have to wait a day or two longer for the change to get into master, unless I switch the CEE nightlies to develop temporarily. |
@alanw0 : any updates on this issue? Our nightlies are still failing with the clang compiler, but it could be due to the fact that we're pulling master Trilinos now (if your fix hasn't made it into master yet). |
Let me check and see if my pull request went in. I'll get back to you. |
@ikalash it was pull request 4682, and it appears to have merged into develop 2 days ago. Perhaps it hasn't made it into master yet. |
@alanw0 I just checked and you are right - the change hasn't made it into master yet. I'll keep an eye out for it in the next few days and close the issue once the nightlies have shown that it has been resolved. |
@alanw0 : unfortunately we are still getting similar compilation errors with the clang compiler on CEE: http://cdash.sandia.gov/CDash-2-3-0/viewBuildError.php?buildid=82926 I actually switched our nightlies to use develop now instead of master, so your fix should definitely be in. P.S. Sorry about the delay - we had a broken build due to other issues in Trilinos until today - finally the PR to fix those got merged in yesterday. |
@ikalash Irina you're killing me. Just kidding, sorry about the continued errors. I'm looking into the stk_balance cmake files further. I have a pull-request in progress now (4732) which fixes some issues in stk_balance cmake files, but I suspect this is something else. I'll get back to you. I will also work on setting stk_balance to off (not enabled) by default. |
@alanw0 I'm worry :(. Hopefully this is the last of the issues! I have been pushing the Trilinos team to set up a clang build - hopefully once that's in place issues like this will get caught before they affect Albany. |
@alanw0 : thanks for the update, I'll keep an eye out for it. |
@alanw0 : it looks like your PR went in 17 hrs ago but we are still having failures in our Clang build http://cdash.sandia.gov/CDash-2-3-0/viewBuildError.php?buildid=83003 |
Sorry @ikalash, I'm getting low on ideas about this... |
@alanw0 : that's fine with me, I will go ahead and make the change. |
@alanw0 @ikalash, sorry to be late to the game, however, I am also seeing this error now on Nalu clang builds... with errors in: Linking CXX executable stk_balance_m2n.exe Undefined symbols for architecture x86_64: My Clang builds were down for a week. However, other builds seem to be fine. The latest push did not seem to help: commit 9ed3ec3
Best, |
@spdomin What's it "referenced from"? |
@spdomin Sorry about the link issues. I've been trying to figure them out but so far without success. I'm puzzled that the issues only show up for clang... As a short term work-around I suggested to Irina that Albany can disable stk-balance since they don't need it. I believe you could also disable it currently, although I believe it will be of interest to you in the future when we improve the MxN capability etc. In the meantime I will continue tracking down the link issues... |
@spdomin @alanw0 : just to follow up: the short-term fix worked for Albany. We've seen in our Albany testing that a lot of issues show up only with clang - that is why having a clang build is important. It would be great if Trilinos tested with the same clang compiler as codes like Albany so that these issues get caught / resolved before they impact applications. I am happy to provide the modules / configure script where the issue shows up @alanw0 if you'd like. It's on the CEE so it should be pretty easy for you to reproduce the problem. |
Alan, yes, I have disabled re-balance as we currently do not have this active in the main code base. Let me know when you feel that this option can be turned back on. The "broken window" comes to mind, however, I know that you are on it:) Let me know if you would like a second pair of eyes on this and I sit in on a STK team room session with you. I have the same opinion of clang as Irina shared. It's a good compiler that picks up things that other compilers miss - not to mention that is is the zero-work compiler on MacOS. |
@spdomin I just noticed that the error you are seeing is different than what Albany is seeing. Yours can be fixed (I'm pretty sure) by adding this to your trilinos cmake-configure step:
Albany is seeing an undefined reference to boost::program_options. Perhaps you will see that too, once you add this tpetra long-long flag... |
Tpetra enables |
@mhoemmen I don't think we need both, and I confess I don't know enough about the configuration process to know why both are being enabled. All I know is we need long-long because stk-balance gives 64-bit stuff to Zoltan2. If tpetra enables long-long by default, then I'm even more puzzled. Because in the nalu-wind project we also hit this error, and it was fixed by adding the Tpetra_INST_INT_LONG_LONG flag. |
@alanw0 wrote:
We need to fix STK Balance so that it uses |
@mhoemmen Mark, stk has these two declarations for the type of global identifiers: |
@alanw0 Thanks for checking! If we're worried about GlobalOrdinal being 64 bits, then we should always use STK should be able to use whatever global index type it wants, as long as it always gives Zoltan2 the index type that Zoltan2 wants. That may imply type conversion, but we need to be OK with that. Zoltan2 will do something much more expensive (global load balancing) than just copying an array of indices locally on every process. |
I would prefer not having to build the solver stack twice - especially if we are not using re-balance at present. Also, every config file that I have seen Nalu use specifies: -DTpetra_INST_INT_LONG:BOOL=ON \ I think that unless windows is desired, this is safe:) https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models At any rate, let me know what the long term solution is here. What happens if I turn off LONG and keep the default to LONG LONG? Does STK now get confused? |
@spdomin Mark and I were talking about this more yesterday, and I'm trying a couple of sierra builds to figure out exactly what we need to instantiate. I don't think you'll need both types. I'll let you know, hopefully later today. |
@mhoemmen I sent you an email about how aria tests fail when I turn off Tpetra_INST_INT_LONG, even though the build/link was successful. Still some investigation to do... |
Short term, I removed STK_Rebalance from our config file. However, STK_Unit has tests that exercise STK_Rebalance. As such, I turned off STK_Unit. However, now, our unit tests are failing to build (as expected) as our unit tests pull in STK unit test mesh fixtures: /Users/naluIt/gitHubWork/nightlyBuildAndTest/Nalu/unit_tests/UnitTestHexElementPromotion.C:15:10: fatal error: Perhaps we should create a new ticket that deals with this Long, Long/Long issue? When I joined the discussion, I was under the impression that Albany and Nalu shared the same rebalance build issue. |
Looks like we just hit this with the new ATDM Trilinos intel-18.0.5 build. See #5335 (comment). Adding the ATDM labels to this issue as well. @alanw0, do we just need to add a BoostLibs reference to these STK subpackages to fix this for now? |
@kddevin (Trilinos Data Services Product Area Lead) FYI: This is breaking the new intel-18.0.5 builds that EMPIRE is relying on (see #5335). It looks like they have a workaround in place for now but it is bringing down the ATDM Trilinos builds protecting this build for EMPIRE (and also we would assume GEMMA). |
This looks like it's due to the stk change to make boostlib optional rather than required. i.e., if boostlib is not enabled, then some stk sub-packages get disabled. I think you can get the builds working again by adding '-DTPL_ENABLE_BoostLib:BOOL=ON'. |
@alanw0 said:
That appears not to be the issue, at least not with the ATDM Trilinos configuration. As shown here the BoostLib TPL is enabled showing:
My guess is that some STK package that requires BoostLib is not properly declaring a dependence on BoostLib but only Boost (therefore finding the header files but not the boost libs). We really need to refactor this to just have a single Boost TPL and then have it have optional components and get rid of the hacked BoostLib TPL. We might be able to make that backward compatible if we are careful. |
@roscoebartlett / @alanw0 - I enabled BoostLib for NALU but it didn't completely fix the issue. In addition, I had to manually add the libraries paths to my link line. I was under the impression that this was NALU related but perhaps not. As background, why this wasn't completely crashing was that the system had a Boost install in the |
Looking at the detailed link lines and poking around some should determine what the problem is and how to fix this. I can almost guarantee the problem is a missing BoostLib line in a STK Dependencies.cmake file. |
@bartlettroscoe you're probably right, but it's not obvious to me where that erroneous dependency or missing dependency is. We'll try to look into it... |
@bartlettroscoe - I'm sorry I got your nick-name incorrect above. Blame it on my morning coffee. |
FYI: This might be caused by an ABI problem and not with TriBITS or STK CMake files. See #5335 (comment). |
This has been resolved for ATDM so removing the "ATDM" label. |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
This issue was closed due to inactivity for 395 days. |
The Albany Trilinos clang build on the CEE is broken. There is an undefined reference error stemming from stk/boost:
http://cdash.sandia.gov/CDash-2-3-0/viewBuildError.php?buildid=82664
Can someone from the @trilinos/stk team please have a look?
The text was updated successfully, but these errors were encountered: