Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: transferAndFillComplete fails #8447

Closed
cgcgcg opened this issue Dec 9, 2020 · 2 comments · Fixed by #8451
Closed

Tpetra: transferAndFillComplete fails #8447

cgcgcg opened this issue Dec 9, 2020 · 2 comments · Fixed by #8451
Labels
client: EMPIRE All issues that most directly target the ATDM EMPIRE code pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests

Comments

@cgcgcg
Copy link
Contributor

cgcgcg commented Dec 9, 2020

Bug Report

@trilinos/tpetra

Description

EMPIRE has reported an issue where transferAndFillComplete fails from within MueLu's rebalancing:

XpetraList.set("Timer Label","MueLu::RebalanceAc-" + Teuchos::toString(coarseLevel.GetLevelID()));
rebalancedAc = MatrixFactory::Build(originalAc, *rebalanceImporter, *rebalanceImporter, targetMap, targetMap, rcp(&XpetraList,false));

It seems that the problem is that originalAc is block diagonal and hence has no importer.
bool bSameDomainMap = BaseDomainMap->isSameAs (*getDomainMap ());
if (! restrictComm && ! MyImporter.is_null () && bSameDomainMap ) {
// Same domain map as source matrix
//
// NOTE: This won't work for restrictComm (because the Import
// doesn't know the restricted PIDs), though writing an
// optimized version for that case would be easy (Import an
// IntVector of the new PIDs). Might want to add this later.
Import_Util::getPids (*MyImporter, SourcePids, false);
}
else if (restrictComm && ! MyImporter.is_null () && bSameDomainMap) {
// Same domain map as source matrix (restricted communicator)
// We need one import from the domain to the column map
IntVectorType SourceDomain_pids(getDomainMap (),true);
IntVectorType SourceCol_pids(getColMap());
// SourceDomain_pids contains the restricted pids
SourceDomain_pids.putScalar(MyPID);
SourceCol_pids.doImport (SourceDomain_pids, *MyImporter, INSERT);
SourcePids.resize (getColMap ()->getNodeNumElements ());
SourceCol_pids.get1dCopy (SourcePids ());
}
else if (MyImporter.is_null () && bSameDomainMap) {
// Matrix has no off-process entries
SourcePids.resize (getColMap ()->getNodeNumElements ());
SourcePids.assign (getColMap ()->getNodeNumElements (), MyPID);
}
else if ( ! MyImporter.is_null () &&
! domainTransfer.is_null () ) {
// general implementation for rectangular matrices with
// domain map different than SourceMatrix domain map.
// User has to provide a DomainTransfer object. We need
// to communications (import/export)
// TargetDomain_pids lives on the rebalanced new domain map
IntVectorType TargetDomain_pids (domainMap);
TargetDomain_pids.putScalar (MyPID);
// SourceDomain_pids lives on the non-rebalanced old domain map
IntVectorType SourceDomain_pids (getDomainMap ());
// SourceCol_pids lives on the non-rebalanced old column map
IntVectorType SourceCol_pids (getColMap ());
if (! reverseMode && ! xferDomainAsImport.is_null() ) {
SourceDomain_pids.doExport (TargetDomain_pids, *xferDomainAsImport, INSERT);
}
else if (reverseMode && ! xferDomainAsExport.is_null() ) {
SourceDomain_pids.doExport (TargetDomain_pids, *xferDomainAsExport, INSERT);
}
else if (! reverseMode && ! xferDomainAsExport.is_null() ) {
SourceDomain_pids.doImport (TargetDomain_pids, *xferDomainAsExport, INSERT);
}
else if (reverseMode && ! xferDomainAsImport.is_null() ) {
SourceDomain_pids.doImport (TargetDomain_pids, *xferDomainAsImport, INSERT);
}
else {
TEUCHOS_TEST_FOR_EXCEPTION(
true, std::logic_error, "Tpetra::CrsMatrix::"
"transferAndFillComplete: Should never get here! "
"Please report this bug to a Tpetra developer.");
}

I removed the check for bSameDomainMap on
else if (MyImporter.is_null () && bSameDomainMap) {

since that is the only case with null importer, and this fixed the EMPIRE issue.
Could someone from the Tpetra team confirm that this is a good fix?

(All Tpetra tests pass on my system.)

@cgcgcg cgcgcg added type: bug The primary issue is a bug in Trilinos code or tests pkg: Tpetra labels Dec 9, 2020
@cgcgcg
Copy link
Contributor Author

cgcgcg commented Dec 9, 2020

@jhux2 @bathmatt

cgcgcg added a commit to cgcgcg/Trilinos that referenced this issue Dec 11, 2020
@cgcgcg cgcgcg linked a pull request Dec 11, 2020 that will close this issue
@cgcgcg cgcgcg added the client: EMPIRE All issues that most directly target the ATDM EMPIRE code label Dec 11, 2020
cgcgcg added a commit to cgcgcg/Trilinos that referenced this issue Dec 11, 2020
@cgcgcg
Copy link
Contributor Author

cgcgcg commented Dec 14, 2020

The added test sems to pass with UVM off.

jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Dec 15, 2020
…s:develop' (e00327c).

* trilinos-develop:
  Panzer STK: Add a test to verify the ability to query the PerceptMesh member
  Panzer STK: Add functionality for keeping and for querying an STK_Interface's PerceptMesh member
  Framework: Reduce rate limit for stale job actionscript
  Updated ninja
  Panzer: fix race condition with launch blocking off in GED accessor.
  MueLu: applying an ugly fix in SaPFactory_kokkos constraint algo
  Updated to cmake 19 for Ross' cuda schedule fixes
  TrilinosCouplings: Fixing maxwell example
  Tpetra: Fix issue in `transferAndFillComplete`
  Tpetra: Add test for trilinos#8447
  forgot to add this fix to change the value in the P matrix
  Moving the Teko configuration to use the Tpetra default LO/GOs
  Add a test to MueLu_UnitTestsTpetra_kokkos that checks the SaP factory including the enforce constaint option.
  fix up some gold files
  fix up some gold files
  fix up some gold files
  fix up some gold files
  KokkosKernels: remove static constexpr member (trilinos#8418)
  fixed up an unused variable and a scalar traits comparison ... as well as updated a couple of gold files
  kokkos version of satisfy constraints for SaP
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Dec 15, 2020
…s:develop' (e00327c).

* trilinos-develop:
  Panzer STK: Add a test to verify the ability to query the PerceptMesh member
  Panzer STK: Add functionality for keeping and for querying an STK_Interface's PerceptMesh member
  Framework: Reduce rate limit for stale job actionscript
  Updated ninja
  Panzer: fix race condition with launch blocking off in GED accessor.
  MueLu: applying an ugly fix in SaPFactory_kokkos constraint algo
  Updated to cmake 19 for Ross' cuda schedule fixes
  TrilinosCouplings: Fixing maxwell example
  Tpetra: Fix issue in `transferAndFillComplete`
  Tpetra: Add test for trilinos#8447
  forgot to add this fix to change the value in the P matrix
  Moving the Teko configuration to use the Tpetra default LO/GOs
  Add a test to MueLu_UnitTestsTpetra_kokkos that checks the SaP factory including the enforce constaint option.
  fix up some gold files
  fix up some gold files
  fix up some gold files
  fix up some gold files
  KokkosKernels: remove static constexpr member (trilinos#8418)
  fixed up an unused variable and a scalar traits comparison ... as well as updated a couple of gold files
  kokkos version of satisfy constraints for SaP
bathmatt pushed a commit that referenced this issue Dec 16, 2020
…develop' (e46fbab).

* potential-trilinos-develop: (64 commits)
  Framework: Reduce rate limit for stale job actionscript
  Updated ninja
  Panzer: fix race condition with launch blocking off in GED accessor.
  Updated to cmake 19 for Ross' cuda schedule fixes
  TrilinosCouplings: Fixing maxwell example
  TrilinosCouplings: Fixing bug
  TrilinosCouplings: Minor mods to Maxwell example
  Tpetra: Fix issue in `transferAndFillComplete`
  Tpetra: Add test for #8447
  PyTrilinos: Update FinMpi4Py.cmake to support Python 3 (#8444)
  Intrepid: fix bug in face tag initialization for high-order H(div) Triangle elements. (#8453)
  Minor fix to SuperLu version used for enum
  kokkos: Replace enums in Kokkos_ViewMapping.hpp
  Framework: Enabling the 'autocloser' github actions script
  Framework: Issue 8429 - TestingEnv Fixes
  Ifpack2: fix initialization error
  Tun off build stats compiler wrappers in all PR builds for now (#7376)
  Set Trilinos_REMOVE_BUILD_STATS_TIMING_FILES_ON_FRESH_CONFIGURE=ON for cuda_9.2 PR build (#7376)
  Add support for <Project>_REMOVE_BUILD_STATS_TIMING_FILES_ON_FRESH_CONFIGURE (#7376)
  Simplify logic for setting <Project>_REMOVE_BUILD_STATS_ON_CONFIGURE (#7376)
  ...
@cgcgcg cgcgcg closed this as completed Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: EMPIRE All issues that most directly target the ATDM EMPIRE code pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant