Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amesos2: Parallel Solve with Equilibration #9814

Closed
Adrian-Diaz opened this issue Oct 14, 2021 · 22 comments
Closed

Amesos2: Parallel Solve with Equilibration #9814

Adrian-Diaz opened this issue Oct 14, 2021 · 22 comments

Comments

@Adrian-Diaz
Copy link

Question

@trilinos/<PackageName>

Greetings,

Is there a way to tell Amesos2, or another Trilinos package that can do so before the solve, to equilibrate the system (lower the condition number) when solving with SuperLUDist? or any other of the parallel solver interfaces. Thanks for any info.

Adrian Diaz

@Adrian-Diaz
Copy link
Author

@srajama1

@jhux2
Copy link
Member

jhux2 commented Oct 14, 2021

@trilinos/amesos2 @iyamazaki

@srajama1
Copy link
Contributor

@iyamazaki is adding some related features, may be he can help.

@iyamazaki
Copy link
Contributor

I've only looked at SuperLU (serial) so far, but I can take a look. It looks like laqgs is commented out for SuperLU_Dist, and I am not sure why. I'll need to ask original developers..

@iyamazaki
Copy link
Contributor

@Adrian-Diaz. I created PR #9824 to add the option to apply Equil for SuperLU_Dist. I was wondering if you could give it a try.

@Adrian-Diaz
Copy link
Author

Adrian-Diaz commented Oct 19, 2021

Hmmm, I don't seem to be getting the right solution. Am I supposed to do anything other than specify Equil "true" in the parameter list I feed to the solver?

EDIT:: wait a second, my build doesn't seem to have captured the update from the PR. I'll try again.

@Adrian-Diaz
Copy link
Author

Its giving me the same weird answer as with the old files.

@iyamazaki
Copy link
Contributor

@Adrian-Diaz. Can you tell me how you are testing it? I tested using a slightly-modified version of your tester from Issue #9420.

@Adrian-Diaz
Copy link
Author

Adrian-Diaz commented Oct 20, 2021

I'm running an FEA beam bending solve where some of the element densities are close to zero (0.01). I fed the following paramlist to the solver:

Linear_Solve_Params = Teuchos::ParameterList("Amesos2");
Teuchos::ParameterList superlu_params = Linear_Solve_Params.sublist("SuperLU_DIST");
//superlu_params.set("Trans","TRANS","Whether to solve with A^T");
superlu_params.set("Equil",true,"Whether to equilibrate the system before solve");
//superlu_params.set("ColPerm","NATURAL","Use 'natural' ordering of columns");

EDIT: Seems like the issue is its not setting the parameter value for me. When I compile Trilinos with a print after line 696 of Amesos2_Superludist_def.hpp the boolean is still set to 0 for me in that setparameters function

@iyamazaki
Copy link
Contributor

I had to do something like:

Teuchos::ParameterList amesos2_params("Amesos2");`
auto superlu_params = Teuchos::sublist(Teuchos::rcpFromRef(amesos2_params), "SuperLU_DIST");
superlu_params->set("Equil", equil);
solver->setParameters( Teuchos::rcpFromRef(amesos2_params) );

Equil may help but may not solve all the accuracy issue.

I only enabled the matrix col & row scaling. SuperLU_Dist has another option that permutes the large nonzero entries to diagonal before scaling the matrix. I have not enabled this option for a couple of reasons, but we can also look into that..

@Adrian-Diaz
Copy link
Author

Yea copying that paramlist setup before my setParameters command fixed it. The beam is now bending as expected without any weird geometry.

@iyamazaki
Copy link
Contributor

Yay @Adrian-Diaz. Let us know if you see any other issues. I can close this issue once PR #9824 get merged.

@Adrian-Diaz
Copy link
Author

Yea sounds good. Thanks for all the help!

@Adrian-Diaz
Copy link
Author

It seems the scaling isn't giving me the right answer in parallel sometimes (by very decisive differences like a factor of 2). This happens even when the condition number should be quite low. As soon as I turn it off the correct solution is obtained for the low condition number system. Is there an additional input I need to declare in the param list? I'm wondering if its because the default has another setting differing from the one I'm feeding to setparameters with the equil flag on.

@iyamazaki
Copy link
Contributor

Thank you for letting us know, @Adrian-Diaz. If you could share the matrix causing the problem, I am happy to take a look.

@Adrian-Diaz
Copy link
Author

Adrian-Diaz commented Oct 25, 2021

A_matrix.txt

The attached matrix with the following code (on 1 rank vs 2 ranks) should reproduce the issue:

#include <Teuchos_ScalarTraits.hpp>
#include <Teuchos_RCP.hpp>
#include <Teuchos_oblackholestream.hpp>
#include <Teuchos_VerboseObject.hpp>
#include <Teuchos_CommandLineProcessor.hpp>

#include <Tpetra_Core.hpp>
#include <Tpetra_Map.hpp>
#include <Tpetra_MultiVector.hpp>
#include <Tpetra_CrsMatrix.hpp>

// I/O for Matrix-Market files
#include <MatrixMarket_Tpetra.hpp>
#include <Tpetra_Import.hpp>

#include "Amesos2.hpp"
#include "Amesos2_Version.hpp"


int main(int argc, char *argv[]) {
  Tpetra::ScopeGuard tpetraScope(&argc,&argv);

  typedef double Scalar;
  typedef Tpetra::Map<>::local_ordinal_type LO;
  typedef Tpetra::Map<>::global_ordinal_type GO;

  typedef Tpetra::CrsMatrix<Scalar,LO,GO> MAT;
  typedef Tpetra::MultiVector<Scalar,LO,GO> MV;

  using Tpetra::global_size_t;
  using Tpetra::Map;
  using Tpetra::Import;
  using Teuchos::RCP;
  using Teuchos::rcp;


  //
  // Get the default communicator
  //
  Teuchos::RCP<const Teuchos::Comm<int> > comm = Tpetra::getDefaultComm();
  int myRank = comm->getRank();

  Teuchos::oblackholestream blackhole;

  bool printMatrix   = false;
  bool printSolution = true;
  bool printTiming   = true;
  bool allprint      = false;
  bool verbose = (myRank==0);
  std::string filename("A_matrix.txt");
  Teuchos::CommandLineProcessor cmdp(false,true);
  cmdp.setOption("verbose","quiet",&verbose,"Print messages and results.");
  cmdp.setOption("filename",&filename,"Filename for Matrix-Market test matrix.");
  cmdp.setOption("print-matrix","no-print-matrix",&printMatrix,"Print the full matrix after reading it.");
  cmdp.setOption("print-solution","no-print-solution",&printSolution,"Print solution vector after solve.");
  cmdp.setOption("print-timing","no-print-timing",&printTiming,"Print solver timing statistics");
  cmdp.setOption("all-print","root-print",&allprint,"All processors print to out");
  if (cmdp.parse(argc,argv) != Teuchos::CommandLineProcessor::PARSE_SUCCESSFUL) {
    return -1;
  }

  std::ostream& out = ( (allprint || (myRank == 0)) ? std::cout : blackhole );
  RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));

  // Say hello
  out << myRank << " : " << Amesos2::version() << std::endl << std::endl;

  const size_t numVectors = 1;

  RCP<MAT> A = Tpetra::MatrixMarket::Reader<MAT>::readSparseFile(filename, comm);
  if( printMatrix ){
    A->describe(*fos, Teuchos::VERB_EXTREME);
  }
  else if( verbose ){
    std::cout << std::endl << A->description() << std::endl << std::endl;
  }

  // get the maps
  RCP<const Map<LO,GO> > dmnmap = A->getDomainMap();
  RCP<const Map<LO,GO> > rngmap = A->getRangeMap();

  GO nrows = dmnmap->getGlobalNumElements();
  RCP<Map<LO,GO> > root_map
    = rcp( new Map<LO,GO>(nrows,myRank == 0 ? nrows : 0,0,comm) );
  RCP<MV> Xhat = rcp( new MV(root_map,numVectors) );
  RCP<Import<LO,GO> > importer = rcp( new Import<LO,GO>(dmnmap,root_map) );

  // Create random X
  RCP<MV> X = rcp(new MV(dmnmap,numVectors));
  X->randomize();

  /* Create B
   *
   * Use RHS:
   *
   *  [[10]
   *   [10]
   *   [10]
   *   [10]
   *   [10]
   *   [10]]
   */
  RCP<MV> B = rcp(new MV(rngmap,numVectors));
  B->putScalar(10);


  // Constructor from Factory
  RCP<Amesos2::Solver<MAT,MV> > solver;
  if( !Amesos2::query("SuperLUDist") ){
    *fos << "SuperLUDist solver not enabled.  Exiting..." << std::endl;
    return EXIT_SUCCESS;
  }

  solver = Amesos2::create<MAT,MV>("SuperLUDist", A, X, B);

  //parameters
  Teuchos::RCP<Teuchos::ParameterList> Linear_Solve_Params = Teuchos::rcp(new Teuchos::ParameterList("Amesos2"));
  auto superlu_params = Teuchos::sublist(Teuchos::rcpFromRef(*Linear_Solve_Params), "SuperLU_DIST");
  superlu_params->set("Equil", true);
  solver->setParameters( Teuchos::rcpFromRef(*Linear_Solve_Params) );

  solver->symbolicFactorization().numericFactorization().solve();

  if( printSolution ){
    // Print the solution
    if( allprint ){
      if( myRank == 0 ) *fos << "Solution :" << std::endl;
      Xhat->describe(*fos,Teuchos::VERB_EXTREME);
      *fos << std::endl;
    } else {
      Xhat->doImport(*X,*importer,Tpetra::REPLACE);
      if( myRank == 0 ){
        *fos << "Solution :" << std::endl;
        Xhat->describe(*fos,Teuchos::VERB_EXTREME);
        *fos << std::endl;
      }
    }
  }

  if( printTiming ){
    // Print some timing statistics
    solver->printTiming(*fos);
  }
  //Teuchos::TimeMonitor::summarize();

  // We are done.
  return 0;
}

@iyamazaki
Copy link
Contributor

@Adrian-Diaz. Sorry, I had a silly bug in the code. I just pushed a fix. Hopefully, this fixed the issue.

@Adrian-Diaz
Copy link
Author

Sounds good, so far the test problem is solving correctly. I'll try some of the larger cases I have to see how consistent things look.

@iyamazaki
Copy link
Contributor

Thank you, @Adrian-Diaz. Please let us know how it looks. PR #9824 has been merged.

@Adrian-Diaz
Copy link
Author

Seems to be working fine on my larger problems so far.

@iyamazaki
Copy link
Contributor

That is great, @Adrian-Diaz. I think I'll close this issue for now, but please let us know if you encounter any other issues!! Thanks again for your help.

@Adrian-Diaz
Copy link
Author

Yea sure thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants