Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MueLu: Failing tests with Tpetra deprecated code disabled #5614

Closed
18 tasks done
Tracked by #5602
tjfulle opened this issue Jul 31, 2019 · 23 comments
Closed
18 tasks done
Tracked by #5602

MueLu: Failing tests with Tpetra deprecated code disabled #5614

tjfulle opened this issue Jul 31, 2019 · 23 comments
Assignees
Labels
pkg: MueLu TpetraRF type: bug The primary issue is a bug in Trilinos code or tests

Comments

@tjfulle
Copy link
Contributor

tjfulle commented Jul 31, 2019

Bug Report

@trilinos/muelu

Part of: #5602

Description

With Tpetra's deprecated code disabled, many MueLu tests fail for GlobalOrdinal=[int,long long int]

If GlobalOrdinal=int

See the full configuration at #5602 (comment)

If GlobalOrdinal=long long int

See the full configuration at: #5602 (comment)

@tjfulle tjfulle added type: bug The primary issue is a bug in Trilinos code or tests pkg: MueLu TpetraRF labels Jul 31, 2019
@lucbv
Copy link
Contributor

lucbv commented Jul 31, 2019

@tjfulle I will give it a look, after re-configuring according to the instructions in #5602 I notice that a lot of packages are enabled. I will at least disable SEACAS, STK and PANZER to save on build time, do you see any reason for it to change the results of the build/tests?

@tjfulle
Copy link
Contributor Author

tjfulle commented Jul 31, 2019

@lucbv no reason to enable everything. The configuration is based on PR testing configuration, which is overkill for fixing these tests, I am sure. I would enable only what you need.

@tjfulle
Copy link
Contributor Author

tjfulle commented Jul 31, 2019

But do note, if you work off of trilinos:develop, it may not build. the branch referenced in #5602 has a bunch of fixes that have not made it back to develop (yet)

@lucbv
Copy link
Contributor

lucbv commented Jul 31, 2019

I see, I have a build that is half-way through so far so I will let it continue and if it fails I'll switch to your branch to fix the tests.
I am working on this "in the background" so I might not be lightning fast... let me know if there is great urgency? Based on all the failing tests you report I am guessing I have a bit of time though ; )

@tjfulle
Copy link
Contributor Author

tjfulle commented Jul 31, 2019

No urgency. I suspect that one or two fixes will fix all the tests.

Since you are only building through MueLu you may not need to use my branch - the fixes are mostly for packages downstream of MueLu

@lucbv
Copy link
Contributor

lucbv commented Jul 31, 2019

No I got unlucky and some belos Epetra adapter died on me so I added your fork as a remote and I am building off of your branch.
How do you want me to submit my fixes though: should I do a PR to your fork or patch Trilinos/develop?
My guess is the later since it might be cleaner?

@tjfulle
Copy link
Contributor Author

tjfulle commented Jul 31, 2019

Note that the configuration I built with disables Epetra (leading to the previous issue with Galeri). It may not build with Epetra enabled.

Feel free to push to my branch (you should have write permissions as a Trilinos developer), or open a PR against trilinos:develop. I plan to split the changes I have already made in my branch out by package and open separate PRs for each, so it would probably be easiest to work with trilinos:develop.

@lucbv
Copy link
Contributor

lucbv commented Jul 31, 2019

@tjfulle so I looked at this test: MueLu_UnitTestsTpetra_MPI_1 and I see that two of the unit-tests are failing because of a call to Tpetra::MatrixMatrix::Multiply which does not seem compute the number of non-zeros of the output matrix correctly and dies because of the following error:

 p=0: *** Caught standard std::exception of type 'std::runtime_error' :
 
  /home/lberge/Research/Trilinos_lucbv/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp:2299:
  
  Throw number = 1
  
  Throw test that evaluated to true: (numInserted == Teuchos::OrdinalTraits<size_t>::invalid())
  
  Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::insertGlobalIndicesImpl: There is not enough capacity to insert indices in to row 0. The upper bound on the number of entries in this row must be increased to accommodate one or more of the new indices.

For info here is the full gdb backtrace I have for this throw: matmatmult_backtrace.txt

I am looking at another unit-test (Zoltan Build) that fails due to bad memory allocation in a matrix.
This one is on us (@trilinos/muelu) so I'll look into fixing it.

@tjfulle
Copy link
Contributor Author

tjfulle commented Jul 31, 2019

The runtime error is from DynamicProfile being deprecated. DynamicProfile is deprecated so any graph/matrix that MueLu constructs will have to be given an upper size limit for entries in a row. Does that make sense?

@lucbv
Copy link
Contributor

lucbv commented Jul 31, 2019

Sure but how do I guess that number for a matrix that is the result of the multiplication of two matrices?
I would have assumed that in that case Tpetra does the calculation for me, otherwise I might as well re-implement the symbolic phase of MatMatMult?

@tjfulle
Copy link
Contributor Author

tjfulle commented Jul 31, 2019

@csiefer2 @jhux2 isn’t there an overload of MMM for this use case?

@cgcgcg
Copy link
Contributor

cgcgcg commented Jul 31, 2019

@tjfulle Is it possible that C_estimate_nnz in TpetraExt_MatrixMatrix_def.hpp is not doing the correct thing when one of the matrices only has one entry per row?

@lucbv
Copy link
Contributor

lucbv commented Jul 31, 2019

@tjfulle I did a bit of investigation already into the wrapper we are using and I can't imagine that it does the wrong thing are almost every single test in MueLu would fail.
This error is more likely due to a bad estimation of the number of non-zero per row from the Tpetra implementation of the MMM.
@mhoemmen do you recall what the expectations are on matrix C computed with MMM as C=A*B?

lucbv added a commit to lucbv/Trilinos that referenced this issue Jul 31, 2019
…rsGraph see issue trilinos#5614

This work will make the test pass after Tpetra removes its deprecated code.
@tjfulle
Copy link
Contributor Author

tjfulle commented Jul 31, 2019

In preparation for removing DynamicProfile I changed Tpetra’s import/export to have the ability to change a graph’s structure - I thought this was being used by MMM so that the target of MMM could be altered to accommodate unknown entries, but I’m unsure. I’m unfamiliar with MMM, @csiefer2 should know

trilinos-autotester added a commit that referenced this issue Aug 1, 2019
…ra_deprecation

Automatically Merged using Trilinos Pull Request AutoTester
PR Title: MueLu: fixing the Zoltan unit-test for static allocation of Tpetra::CrsGraph see issue #5614
PR Author: lucbv
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 1, 2019
…s:develop' (5f78e9f).

* trilinos-develop:
  MueLu: fixing the Zoltan unit-test for static allocation of Tpetra::CrsGraph see issue trilinos#5614
  Turned off delated fields unless needed
  Tpetra: Fixing performance tests cmake logic
  BelosGmresPolyOp: Build fix for float.
  one more try at getting Epetra to be required...
  Removed call that can never happen for mesh coords
  Added ability to delay the adding of vars to dump
  FEI: make Epetra a required dependency
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 1, 2019
…s:develop' (5f78e9f).

* trilinos-develop:
  MueLu: fixing the Zoltan unit-test for static allocation of Tpetra::CrsGraph see issue trilinos#5614
  Turned off delated fields unless needed
  Tpetra: Fixing performance tests cmake logic
  BelosGmresPolyOp: Build fix for float.
  one more try at getting Epetra to be required...
  Removed call that can never happen for mesh coords
  Added ability to delay the adding of vars to dump
  FEI: make Epetra a required dependency
@lucbv
Copy link
Contributor

lucbv commented Aug 1, 2019

@tjfulle the above PR #5628 should fix the issue with the variable dofs per node test.
It was relatively straightforward so I don't expect problems with the auto-tester.
Moving on to the BlockedTransfer test.

@tjfulle
Copy link
Contributor Author

tjfulle commented Aug 1, 2019

Thanks @lucbv!

@lucbv
Copy link
Contributor

lucbv commented Aug 1, 2019

So the BlockedTransfer test fails because it uses the Petrov-Galerkin prolongator factory which uses the MatMatMult operation from Xpetra/Tpetra.
We might want to create a seperate issue for that to track it properly between MueLu/Xpetra and Tpetra.
This pretty much concludes the work for GlobalOrdinal=long long int in the sense that everything is fully triaged, root cause for failures are clearly identified and work has been assigned appropriately.

@mhoemmen
Copy link
Contributor

mhoemmen commented Aug 1, 2019

@mhoemmen do you recall what the expectations are on matrix C computed with MMM as C=A*B?

Not off-hand. @csiefer2 is the expert here.

lucbv added a commit to lucbv/Trilinos that referenced this issue Aug 1, 2019
…, issue trilinos#5614

Mainly tweaking a bit the xml file to have coordinate flow through MueLu.
No idea why this pops up now?
@lucbv
Copy link
Contributor

lucbv commented Aug 1, 2019

@tjfulle almost everything is now covered one way or another.
The main thing will be to take care of the MMM kernel but this will wait until next week when @csiefer2 is back from travel.
Regarding the MueLu_ParameterListInterpreterTpetra_MPI_1 I'm too tired to look at it now so it'll probably wait until next week.
Once PR #5631 is merged, would you mind re-running the tests with all the changes to check that things are good on you end?

trilinos-autotester added a commit that referenced this issue Aug 2, 2019
Automatically Merged using Trilinos Pull Request AutoTester
PR Title: MueLu: fixing isorropia test before removal of Tpetra deprecated code issue #5614
PR Author: lucbv
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 2, 2019
…s:develop' (d679c97).

* trilinos-develop:
  MueLu: fixing another xml using the isorropia interface
  MueLu: fixing isorropia test before removal of Tpetra deprecated code, issue trilinos#5614
  MueLu: fixing the number of non-zeros per row in unsmoosh factory
  Tpetra: trilinos#5525 fix
  IOSS: cgns - support fields on structured block embedded nodeblock
@tjfulle
Copy link
Contributor Author

tjfulle commented Aug 2, 2019

Thanks @lucbv! I’m on travel till early next week, I’ll look then

jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 4, 2019
…s:develop' (d679c97).

* trilinos-develop: (66 commits)
  zoltan2:  changes to allow deprecation of dynamicProfile trilinos#5602
  Ifpack2: Fixing an error that affects clang
  Tpetra: Fix trilinos#5639 (remove RTI)
  ShyLU: fixed test failures w/no deprecated (trilinos#5617)
  Phalanx: remove clang warning
  Xpetra: Issue-5591 (checkpoint COB)
  MueLu: fixing another xml using the isorropia interface
  MueLu: fixing isorropia test before removal of Tpetra deprecated code, issue trilinos#5614
  Xpetra: Debugging ETI & Cleanup
  Ifpack2: Container cleanup
  Ifpack2: Fixed Scalar type mismatches w/CUDA
  Ifpack2: Removed STS typedef from Container
  MueLu: fixing the number of non-zeros per row in unsmoosh factory
  Tpetra: trilinos#5525 fix
  kokkos-kernels: Fix trilinos#5624
  IOSS: cgns - support fields on structured block embedded nodeblock
  Ifpack2: BlockRelaxation: make decoupling default
  Ifpack2: replace (T)val cast with T(val)
  Ifpack2: removed BlockRelaxationPerf test
  Ifpack2: use readable SC/LO/GO/NO types
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 4, 2019
…s:develop' (d679c97).

* trilinos-develop: (66 commits)
  zoltan2:  changes to allow deprecation of dynamicProfile trilinos#5602
  Ifpack2: Fixing an error that affects clang
  Tpetra: Fix trilinos#5639 (remove RTI)
  ShyLU: fixed test failures w/no deprecated (trilinos#5617)
  Phalanx: remove clang warning
  Xpetra: Issue-5591 (checkpoint COB)
  MueLu: fixing another xml using the isorropia interface
  MueLu: fixing isorropia test before removal of Tpetra deprecated code, issue trilinos#5614
  Xpetra: Debugging ETI & Cleanup
  Ifpack2: Container cleanup
  Ifpack2: Fixed Scalar type mismatches w/CUDA
  Ifpack2: Removed STS typedef from Container
  MueLu: fixing the number of non-zeros per row in unsmoosh factory
  Tpetra: trilinos#5525 fix
  kokkos-kernels: Fix trilinos#5624
  IOSS: cgns - support fields on structured block embedded nodeblock
  Ifpack2: BlockRelaxation: make decoupling default
  Ifpack2: replace (T)val cast with T(val)
  Ifpack2: removed BlockRelaxationPerf test
  Ifpack2: use readable SC/LO/GO/NO types
  ...
@csiefer2
Copy link
Member

csiefer2 commented Aug 5, 2019

Huh. That's weird. Will look at this MMM & Intrepid issues next week.

@brian-kelley
Copy link
Contributor

All the checkboxes are filled! There's only one new MueLu test failure:
MueLu_DriverEpetra_CircNspDependency_MPI_4. And this probably doesn't have anything to do with our deprecation stuff:


Laplace2D: Build (MueLu::CoarseMapFactory)
Striding info = {}   [default]
Strided block id = -1   [default]
Domain GID offsets = {0}   [default]

Laplace2D: Build (MueLu::RebalanceTransferFactory)
MueLu_Driver: preconditioner setup crashed w/ message:/home/nightlyTesting/OPENMPI-1.10.1_RELEASE_TPETRA_DEPRECATED_CODE_OFF_ENABLE_DOWNSTREAM_GO_INT/Trilinos/packages/muelu/src/Interface/../MueCentral/MueLu_Level.hpp:197:

Throw number = 3

Throw test that evaluated to true: !IsKey(fac, ename)

"P" not found
MueLu_Driver: Not solving system due to crash in preconditioner setup

@brian-kelley
Copy link
Contributor

Closing since the deprecated removal PR is ready to go in next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: MueLu TpetraRF type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

7 participants