Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teko: configuration error with deprecated code off and -D Tpetra_INST_INT_UNSIGNED_LONG:BOOL=ON #6355

Closed
ikalash opened this issue Nov 26, 2019 · 39 comments
Labels
client: Albany Issue impacting the Albany project CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. pkg: Teko type: bug The primary issue is a bug in Trilinos code or tests

Comments

@ikalash
Copy link
Contributor

ikalash commented Nov 26, 2019

I attempted to switch Albany to turn off Tpetra and Xpetra deprecated code in the Albany nightlies (with @mperego). In some of the builds, I need to have the flag:

-D Tpetra_INST_INT_UNSIGNED_LONG:BOOL=ON \

in order to avoid DTK-related compilation errors. I also have Teko on, as it is used in some of the tests that use DTK. Doing so produces the following error:

Processing enabled package: Teko (Libs)
CMake Error at packages/teko/CMakeLists.txt:25 (MESSAGE):
  Teko currently requires that Tpetra enable either GlobalOrdinal (GO) = long
  long or GO = int.  Please set Tpetra_INST_INT_LONG_LONG=ON to enable GO =
  long long, or Tpetra_INST_INT_INT=ON to enable GO = int.  If you would like
  to use other GO types, please contact the Teko developers.

Is it possible to have a build with the following 4 options simultaneously?

 -D Tpetra_ENABLE_DEPRECATED_CODE:BOOL=OFF \
 -D Xpetra_ENABLE_DEPRECATED_CODE:BOOL=OFF \
 -D Trilinos_ENABLE_Teko:BOOL=ON \
 -D Tpetra_INST_INT_LONG_LONG=ON \

@trilinos/teko

@ikalash ikalash added type: bug The primary issue is a bug in Trilinos code or tests pkg: Teko client: Albany Issue impacting the Albany project labels Nov 26, 2019
@eric-c-cyr
Copy link
Contributor

eric-c-cyr commented Nov 26, 2019 via email

@eric-c-cyr
Copy link
Contributor

eric-c-cyr commented Nov 26, 2019 via email

@mhoemmen
Copy link
Contributor

mhoemmen commented Nov 26, 2019

@ikalash wrote:

-D Tpetra_INST_INT_UNSIGNED_LONG:BOOL=ON

Uh oh. I was under the impression that DTK no longer had that requirement. @rppawlo , would you happen to know if there is a way to tell DTK not to use unsigned long as the global index type?

@ikalash , if I remember right, the history is that DTK wanted to use the same global index type that STK uses. (My personal view is that there's no need for two interoperating packages to share index types. However, Tpetra's design circa 10 years ago unfortunately gave users the impression that they could and should change Tpetra's global index type to match that of any other library with which they wanted to interact. Sometimes design choices that intend "flexibility" end up causing rigidity instead.)

@rppawlo
Copy link
Contributor

rppawlo commented Nov 26, 2019

I haven't worked on DTK in a long time. Albany is using DTK version 2.0 that requires the unsigned long GO. They would have to upgrade to the latest DTK to use an arbitrary GO. cc'ing @sslattery to make sure that is still the case.

@mperego
Copy link
Contributor

mperego commented Nov 26, 2019

@mhoemmen can you see any workaround, or upgrading to the latest DTK is the only soltion?
@ikalash Is there any serious issue upgrading to the latest DTK?

@mhoemmen
Copy link
Contributor

@ikalash Another thing you could try is disabling all the other GlobalOrdinal types, and just using unsigned long everywhere. Try this:

-D Tpetra_INST_INT_UNSIGNED_LONG:BOOL=ON
-D Tpetra_INST_INT_LONG:BOOL=OFF
-D Tpetra_INST_INT_LONG_LONG:BOOL=OFF
-D Tpetra_INST_INT_INT:BOOL=OFF

@ikalash
Copy link
Contributor Author

ikalash commented Nov 26, 2019

Thanks for the suggestions. @mhoemmen , your suggestion does not fix the problem. Regarding @eric-c-cyr 's question:

I’m not really sure about changes propagated as part of the DEPRECATED refactor.

I was wondering the same thing. Can someone explain to me why turning off the deprecated code would cause this error to crop up, when it's not there w/o the deprecated code?

@rppawlo : I remember now past discussions about updating DTK. I tried it once and got a compilation error, so we kept the old DTK. Maybe we should look at upgrading it. CC'ing @lxmota for whom this issue would also be relevant.

@rppawlo
Copy link
Contributor

rppawlo commented Nov 26, 2019

tpetra now only allows a single GO that is picked at configure time. When you turn off deprecated code, you are stuck with choosing one GO. Albany has always built with at least two GOs. Unfortunately DTK 2.0 requires unsigned long and it looks like that GO isn't working for part of the trilinos stack. You could just try removing the teko configure check at packages/teko/CMakeLists.txt:25 and see if the build works. Maybe teko required a specific GO at one point but now works with any. Maybe the check is no longer needed?

ikalash added a commit to sandialabs/Albany that referenced this issue Nov 26, 2019
having to do with DTK.  Seems we will need to switch to newer DTK
which will require some effort.

Please see trilinos/Trilinos#6355 for more details.
@eric-c-cyr
Copy link
Contributor

eric-c-cyr commented Nov 26, 2019 via email

@mhoemmen
Copy link
Contributor

@ikalash wrote:

@mhoemmen , your suggestion does not fix the problem.

Please post the resulting errors.

@ikalash
Copy link
Contributor Author

ikalash commented Nov 27, 2019

@mhoemmen : same errors as before from Teko with your options + the deprecated code options:

CMake Error at packages/teko/CMakeLists.txt:25 (MESSAGE):
  Teko currently requires that Tpetra enable either GlobalOrdinal (GO) = long
  long or GO = int.  Please set Tpetra_INST_INT_LONG_LONG=ON to enable GO =
  long long, or Tpetra_INST_INT_INT=ON to enable GO = int.  If you would like
  to use other GO types, please contact the Teko developers.

Here are the options I have:

 -D Tpetra_ENABLE_DEPRECATED_CODE:BOOL=OFF \
 -D Xpetra_ENABLE_DEPRECATED_CODE:BOOL=OFF \
-D Tpetra_INST_INT_UNSIGNED_LONG:BOOL=ON \
      -D Tpetra_INST_INT_LONG:BOOL=OFF \
      -D Tpetra_INST_INT_LONG_LONG:BOOL=OFF \
      -D Tpetra_INST_INT_INT:BOOL=OFF \

@eric-c-cyr
Copy link
Contributor

eric-c-cyr commented Nov 27, 2019 via email

@mhoemmen
Copy link
Contributor

mhoemmen commented Nov 27, 2019

@eric-c-cyr wrote:

...then in teko/src/Teko_ConfigDefs.hpp define the GO to be whatever is required from Tpetra (not sure what the line is, suggest asking @mhoemmen)

That header file already includes Tpetra_Map_decl.hpp, so you can just get the default GO from Map, in the same way that the file already gets the default NT from Map.

namespace Teko {
  using LO = Tpetra::Map<>::local_ordinal_type;
  using GO = Tpetra::Map<>::global_ordinal_type;
  using NT = Tpetra::Map<>::node_type;
}

(Note the C++11 type alias syntax; it should replace all use of typedef in new code.) No macros, no errors, no fuss.

@eric-c-cyr
Copy link
Contributor

eric-c-cyr commented Nov 27, 2019 via email

@ikalash
Copy link
Contributor Author

ikalash commented Nov 27, 2019

@mhoemmen , @eric-c-cyr : thanks for the suggestions! Yes I can give this a try. My parents are in town for Thanksgiving so I don't know if I'll get to it this week. If not, definitely Monday. I imagine if I implement the suggestion and it fixes the problem, I can submit a PR to get it into the main code?

@eric-c-cyr
Copy link
Contributor

eric-c-cyr commented Nov 27, 2019 via email

@ikalash
Copy link
Contributor Author

ikalash commented Nov 27, 2019

Sounds like a plan! I’ll keep you posted.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 7, 2019

@eric-c-cyr , @mhoemmen : sorry about the delay - I finally had a chance to try your proposed fix, namely modify Teko_ConfigDefs.hpp to look as follows:

namespace Teko {

typedef double ST;
using LO = Tpetra::Map<>::local_ordinal_type;
using GO = Tpetra::Map<>::global_ordinal_type;
using NT = Tpetra::Map<>::node_type;
}

and make some changes to get the Teko tests/examples to compile and not show warnings when compiled. Unfortunately this leads to some failures in Albany. Some of them seem to have to do with us not allocating large enough graphs for some problems involving DTK/our Schwarz coupling method, which is not your problem, but I noticed a few other failures namely this one:

3: ************************************************************************
3: -- Nonlinear Solver Step 0 -- 
3: ||F|| = 3.633e+08  step = 0.000e+00  dx = 0.000e+00
3: ************************************************************************
3: 
3:        CALCULATING FORCING TERM
3:        Method: Constant
3:        Forcing Term: 0.0001
3: 
3: p=2: *** Caught standard std::exception of type 'std::logic_error' :
3: 
3:  /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build/install/include/Tpetra_ComputeGatherMap.hpp:84:
3:  
3:  Throw number = 1
3:  
3:  Throw test that evaluated to true: true
3:  
3:  Not implemented for IntType != int, long, or long long

It looks like a routine in Tpetra_ComputeGatherMap.h is being called with unsigned int long, which is not supported. I'm a bit confused as to why my changes to Teko would produce this error. Without the changes, the test runs correctly (even with the deprecated Tpetra/Xpetra code off, not on my machine, which has DTK on, but on other machines, where DTK is off, so I can build everything just fine). Thoughts? BTW, I'm in ABQ next week so we could look at this together sometime if it's easier and you have time.

@eric-c-cyr
Copy link
Contributor

eric-c-cyr commented Dec 7, 2019 via email

@ikalash
Copy link
Contributor Author

ikalash commented Dec 7, 2019

Just to be clear, the tests with error message don't use teko. That's what I don't understand...

@mhoemmen
Copy link
Contributor

mhoemmen commented Dec 9, 2019

@ikalash This is why we don't want to support random GlobalOrdinal types ;-) Trilinos doesn't test GO=unsigned long. I can try to fix this by converting the indices if they are of an unsupported type, but it will take a little while. This is technically a @trilinos/tpetra issue, but I'd rather that we stop using poorly tested GO types.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 9, 2019

@mhoemmen : I agree that at this level it's a @trilinos/tpetra issue. It seems to me that the current behavior is not ideal, if the code allows you to build with the unsigned long type (which I presume it would even w/o my change to Teko, if Teko were disabled I think), but then you may get run-time errors. If you prefer not to support types like GO=unsigned long, I think the best thing to do would be to throw a configure-time error when trying to use the type (like Teko does now, I guess). That of course would not fix my issue, which requires me to use ``GO=unsigned long" for DTK... the fix you propose to convert indices to the currently unsupported type (hopefully) would.

I've started looking into switching to a newer DTK, but am stuck for the moment. I'm getting various compilation errors that don't seem to have anything to do with Albany - I've opened a DTK issue on this and am waiting to hear back from Stuart Slattery on it.

@kddevin
Copy link
Contributor

kddevin commented Dec 9, 2019

The unimplemented routine is getMpiDatatype
Would there be any harm in our adding an instantiation for MPI_UNSIGNED_LONG?
Would that be sufficient to solve the issue (without any sort of conversions)?

@ikalash
Copy link
Contributor Author

ikalash commented Dec 9, 2019

@kddevin : that would be great if there's a temporary fix while I figure out how to get the newer DTK versions (which should build with a different GO than unsigned long) to work.

If you have a PR/snippet of code I can hack into my tpetra version with the additional getMpiDataType routine, I can test if that's sufficient. It's conceivable other issues will crop up with this fix that we're not aware of now, but hopefully not.

@kddevin
Copy link
Contributor

kddevin commented Dec 9, 2019

Thanks for being willing to try a patch, @ikalash .
Here's what I would try.
Add near line 94, still inside the HAVE_MPI ifdef block:

      template<> TRILINOS_UNUSED_FUNCTION MPI_Datatype
      getMpiDatatype<unsigned long> () { return MPI_UNSIGNED_LONG; }

Sorry I don't have a way to reproduce your error and test this for you.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 9, 2019

Thanks @kddevin! I'll test it and get back to you. I wasn't expecting you to reproduce/test the fix - happy to do that myself. I'll post back once I've had a chance to do it.

@mhoemmen
Copy link
Contributor

mhoemmen commented Dec 9, 2019

@kddevin Thanks -- I'll check whether MPI_UNSIGNED_LONG comes with MPI 3 by default.

@mhoemmen
Copy link
Contributor

mhoemmen commented Dec 9, 2019

@kddevin The MPI 3 standard includes MPI_UNSIGNED, MPI_UNSIGNED_LONG, and MPI_UNSIGNED_LONG_LONG. I'm pretty sure we require MPI 3 for testing now and don't need to support (say) OpenMPI 1.6, so I think it's safe to use -- thanks!

@ikalash
Copy link
Contributor Author

ikalash commented Dec 11, 2019

@kddevin : sorry for the delay in my response - I'm in a workshop for most of this week. It appears your proposed fix resolves my issue. Per Mark's comment, do we want to include the unsigned and unsigned long long version of getMpiDataType too?

Also, should I create a PR for the Tpetra (and Teko) changes? I am still doing some testing before this would be ready, in particular, it appears my changes broke a couple of the Teko tests, which I need to investigate/fix.

ikalash added a commit that referenced this issue Dec 15, 2019
…ossible

to compile Trilinos with Tpetra_INST_INT_UNSIGNED_LONG:BOOL=ON .

There are still some issues to be addressed.
@ikalash
Copy link
Contributor Author

ikalash commented Dec 15, 2019

@kddevin , @mhoemmen , @eric-c-cyr : I created a branch in which I implemented the proposed changes by the three of you above: https://github.com/trilinos/Trilinos/tree/ikalash-tpetra-teko-unsigned-long . I'm concerned about some of the logic throughout Teko/Tpetra that may not be right with GO=unsigned long. I'm attaching the output when I build Trilinos with my changes. You can see there are a lot of -Wsign-compare warnings, and I'm not sure all of them are innocuous. In particular, it looks like certain variables representing GIDs are allowed to be negative and there is logic to handle that case, e.g.:

/home/ikalash/dtkBuilds/TrilinosDtk2-ikalash-tpetra-teko-unsigned-long/packages/teko/src/Tpetra/Teko_BlockingTpetra.cpp:272:16: warning: comparison of integer expressions of different signedness: ‘Teko::GO’ {aka ‘long unsigned int’} and ‘int’ [-Wsign-compare]
  272 |          if(gid==-1) continue; // in contiguous row

and

/home/ikalash/dtkBuilds/TrilinosDtk2-ikalash-tpetra-teko-unsigned-long/packages/tpetra/core/src/Tpetra_Import_Util.hpp:432:15: warning: comparison of integer expressions of different signedness: ‘long unsigned int’ and ‘int’ [-Wsign-compare]
  432 |       if(cgid == -2) continue;

I suppose this would have been an issue in the develop branch with -D Tpetra_INST_INT_LONG_LONG=ON and -D Tpetra_ENABLE_DEPRECATED_CODE:BOOL=ON , -D Xpetra_ENABLE_DEPRECATED_CODE:BOOL=ON.

Also, there are some test failures in Tpetra:

The following tests FAILED:
         99 - TpetraCore_ImportExport_UnitTests_MPI_4 (Failed)
        111 - TpetraCore_ImportExport2_UnitTests_MPI_4 (Failed)
        112 - TpetraCore_ImportExport2_CrsSortingUtils_MPI_4 (Failed)
        142 - TpetraCore_MatrixMatrix_UnitTests_MPI_4 (Failed)

These have to do with the issue I describe above, I think - I fixed the HashTable test by commenting out the key=-1 case (which would be invalid for GO=unsigned long): 2dc1802#diff-936bd1d31c5c99477f4948a8d1398d95 . This probably should be reworked. Would you be able to have a look at it?

make_out.txt

@ikalash
Copy link
Contributor Author

ikalash commented Dec 15, 2019

Just to follow up on my previous comment - I get a bunch of Tpetra test failures when building the develop branch (w/o any of my changes) with the options

 -D Tpetra_ENABLE_DEPRECATED_CODE:BOOL=OFF \
 -D Xpetra_ENABLE_DEPRECATED_CODE:BOOL=OFF \
 -D Trilinos_ENABLE_Teko:BOOL=ON \
 -D Tpetra_INST_INT_LONG_LONG=ON \

It's pretty much the same failures:

The following tests FAILED:
         96 - TpetraCore_FixedHashTableTest_MPI_1 (Failed)
         99 - TpetraCore_ImportExport_UnitTests_MPI_4 (Failed)
        112 - TpetraCore_ImportExport2_UnitTests_MPI_4 (Failed)
        113 - TpetraCore_ImportExport2_CrsSortingUtils_MPI_4 (Failed)
        114 - TpetraCore_MatrixMarket_Tpetra_CrsMatrix_InOutTest_MPI_4 (Failed)
        115 - TpetraCore_MatrixMarket_Tpetra_CrsGraph_InOutTest_MPI_4 (Failed)
        116 - TpetraCore_MatrixMarket_Operator_Test_MPI_4 (Failed)
        147 - TpetraCore_MatrixMatrix_UnitTests_MPI_4 (Failed)

So I guess the -D Tpetra_INST_INT_LONG_LONG=ON option leads to test failures. I'm ok with this if you're OK with this... but I thought I should bring it up.

@mhoemmen
Copy link
Contributor

@ikalash wrote:

I get a bunch of Tpetra test failures when building the develop branch (w/o any of my changes) with the options....

Did you actually mean to write Tpetra_INST_INT_LONG_LONG=ON, or did you mean to write Tpetra_INST_INT_UNSIGNED_LONG=ON? I'm asking because if you disable deprecated code in Tpetra, the default GlobalOrdinal type is long long. That would make Tpetra_INST_INT_LONG_LONG=ON redundant.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 16, 2019

@mhoemmen : no, I meant Tpetra_INST_INT_UNSIGNED_LONG=ON. I need to set this to get the codes to compile with the version of DTK (v2) I am using (newer versions of DTK don't require this but I'm having some trouble building with them, so I am sticking with DTK v2 for now). Essentially what I'm finding is some Tpetra tests fail with the option Tpetra_INST_INT_UNSIGNED_LONG=ON in master/develop as well as my branch (which has some changes to get Teko, Tpetra and DTK to play nicely with Tpetra_INST_INT_UNSIGNED_LONG=ON and the deprecated Tpetra/Xpetra code turned off). This has probably been the case all along, I just never ran Tpetra tests.

@mhoemmen
Copy link
Contributor

@ikalash wrote:

Essentially what I'm finding is some Tpetra tests fail with the option Tpetra_INST_INT_UNSIGNED_LONG=ON in master/develop as well as my branch (which has some changes to get Teko, Tpetra and DTK to play nicely with Tpetra_INST_INT_UNSIGNED_LONG=ON and the deprecated Tpetra/Xpetra code turned off).

This is ... unsurprising, given that the automatic PR tests have never enabled GlobalOrdinal=unsigned long. I don't think there's even a Dashboard test that does so. DTK folks requested this feature long ago; Trilinos devs added it very reluctantly and only if DTK folks promised to maintain and test it themselves. I can't speak for @trilinos/tpetra as a whole, but I imagine Trilinos developers wouldn't mind fixing obvious issues like the ones you pointed out. Long-term support is a bit trickier. Would you need DTK v2 to work for a while, or is this a temporary measure until you can upgrade to a newer DTK version?

@ikalash
Copy link
Contributor Author

ikalash commented Dec 16, 2019

@mhoemmen : thanks for the explanations. I do need DTK v2 to work for awhile (a few months, maybe?). I'm still trying to figure out what the issues I'm having with DTK v3 and newer are being caused by. Could we merge my PR in while I'm try to get the code to work with the newer DTK?

@mhoemmen
Copy link
Contributor

mhoemmen commented Dec 16, 2019

@ikalash Did you actually make a PR out of the branch https://github.com/trilinos/Trilinos/tree/ikalash-tpetra-teko-unsigned-long ? or are you referring to a different PR?

@ikalash
Copy link
Contributor Author

ikalash commented Dec 16, 2019

Yes, exactly, that is the PR I'm referring to. I added you as one of the reviewers.

@github-actions
Copy link

github-actions bot commented Nov 7, 2021

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Nov 7, 2021
@github-actions
Copy link

github-actions bot commented Dec 8, 2021

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Dec 8, 2021
@github-actions github-actions bot closed this as completed Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: Albany Issue impacting the Albany project CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. pkg: Teko type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

6 participants