Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test MueLu_Maxwell3D-Tpetra-ML-list_MPI_4 failing in new build Trilinos-atdm-sems-rhel6-gnu-7.2.0-openmp-release-debug-no-global-int #5414

Closed
bartlettroscoe opened this issue Jun 19, 2019 · 17 comments
Assignees
Labels
ATDM Sev: Blocker Problems that make Trilinos unfit to be adopted by one or more ATDM APPs client: ATDM Any issue primarily impacting the ATDM project PA: Linear Solvers Issues that fall under the Trilinos Linear Solvers Product Area pkg: MueLu type: bug The primary issue is a bug in Trilinos code or tests

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Jun 19, 2019

CC: @trilinos/muelu, @srajama1 (Trilinos Linear Solvers Product Lead), @bartlettroscoe, @fryeguy52, @rppawlo

Next Action Status

Test was disabled for this build configuration since it is not valid in PR #5402 merged on 6/24/2019 and this failing test disappeared on 6/25/2019.

Description

As shown in this query the MueLu test:

fails in the new build:

  • Trilinos-atdm-sems-rhel6-gnu-7.2.0-openmp-release-debug-no-global-int

showing the error here showing:

p=1: *** Caught standard std::exception of type 'MueLu::Exceptions::RuntimeError' :

 /jenkins/slave/workspace/Trilinos-atdm-sems-rhel6-gnu-7.2.0-openmp-release-debug-no-global-int/SRC_AND_BUILD/Trilinos/packages/muelu/src/Interface/MueLu_ML2MueLuParameterTranslator.cpp:328:
 
 Throw number = 1
 
 Throw test that evaluated to true: !paramList.sublist("coarse: list").isParameter("smoother: type")
 
 MueLu::MLParameterListInterpreter::Setup(): no coarse grid solver defined.

which results in just one not-run test:

This is part of an effort to set Tpetra_INST_INT_INT=OFF in all of that ATDM Trilinos builds (see ATDV-174) related to #4915.

Current Status on CDash

See recent results for MueLu in this build over last several days.

Steps to Reproduce

One should be able to reproduce this failure on any SNL COE RHEL6 machine that has the SEMS env mounted. See:

The exact commands to reproduce this issue should be:

$ cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh \
  Trilinos-atdm-sems-rhel6-gnu-7.2.0-openmp-release-debug-no-global-int

$ cmake \
 -GNinja \
 -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
 -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_MueLu=ON \
 $TRILINOS_DIR

$ ninja -j16

$ ctest -VV -R MueLu_Maxwell3D-Tpetra-ML-list_MPI_4
@bartlettroscoe bartlettroscoe added type: bug The primary issue is a bug in Trilinos code or tests pkg: MueLu client: ATDM Any issue primarily impacting the ATDM project ATDM Sev: Blocker Problems that make Trilinos unfit to be adopted by one or more ATDM APPs PA: Linear Solvers Issues that fall under the Trilinos Linear Solvers Product Area labels Jun 19, 2019
@cgcgcg
Copy link
Contributor

cgcgcg commented Jun 19, 2019

PR #5402 should take care of this.

@cgcgcg cgcgcg self-assigned this Jun 19, 2019
@srajama1
Copy link
Contributor

@cgcgcg Thanks ! I suggest closing this then .. Does Maxwell get tested in PR testing ? How did this pass there ?

@cgcgcg
Copy link
Contributor

cgcgcg commented Jun 19, 2019

@srajama1 I'll close it once the PR is merged. Yep, it does. The error comes from enabling ML without Epetra, which the PR tester is not doing.

@bartlettroscoe
Copy link
Member Author

@srajama1 said:

I suggest closing this then

We can't close an ATDM Trilinos GitHub issue until the PR is merged and we get confirmation on CDash that the issue is resolved.

@csiefer2
Copy link
Member

csiefer2 commented Jun 19, 2019

The error comes from enabling ML without Epetra, which the PR tester is not doing.

If that is actually what the build is doing then it is an perverse configuration which does not actually represent the needs of ATDM customers.

@bartlettroscoe
Copy link
Member Author

@cgcgcg said:

The error comes from enabling ML without Epetra, which the PR tester is not doing.

Do you mean "enabling MueLu without Epetra"? What does ML have to do with MueLu? We are talking about a MueLu test here, not an ML test. I see that MueLu has a dependence on ML but I would assume this is just a test dependence (to compare MueLu and ML) but that core MueLu functionality does not depend on ML (or how would you run on a GPU)?

And just to be clear, Epetra support in ML is not currently being disabled in the ATDM Trilinos configuration as shown here which shows:

-- Setting ML_ENABLE_Epetra=ON since Trilinos_ENABLE_ML=ON AND Trilinos_ENABLE_Epetra=ON

@csiefer2 said:

That is an perverse configuration which does not represent the needs of ATDM customers. I highly suggest changing the configuration.

In what way? Enabling MueLu with Tpetra but without Epetra seems perfectly logical.

@cgcgcg
Copy link
Contributor

cgcgcg commented Jun 19, 2019

@bartlettroscoe We test the translation of an ML style parameter list for use with MueLu. We call the ML parameter list validation. MueLu does not have Epetra support though, since we would otherwise not see this error message.

@csiefer2
Copy link
Member

In what way? Enabling MueLu with Tpetra but without Epetra seems perfectly logical.

It is. Turning on ML without Epetra is illogical. Which is, from what you've presented, not happening either.

@bartlettroscoe
Copy link
Member Author

It is. Turning on ML without Epetra is illogical. Which is, from what you've presented, not happening either.

Can we just turn of ML all together for ATDM? Can MueLu be made to build and work correctly without ML? Can the parts of ML that MueLu uses be moved into an independent (sub)package and then used by ML and MueLu?

@bartlettroscoe
Copy link
Member Author

CC: @rppawlo, @srajama1

@trilinos/muelu

FYI: With #5411 and #5412 now resolved, this MueLu failure is the last failure we need to resolve in order to disable the global ordinal 'int' instantiation in the ATDM Trilinos builds. We can't pull the trigger on that until this failure is resolved (one way or another) or this test will fail in every ATDM Trilinos build on all platforms.

Is this test alone protecting critical functionality for ATDM APPs? Are the no other MueLu tests protecting this functionality? What is the risk if we just disable this test for ATDM Trilinos builds? (But once a test gets disabled it seems they never get re-enabled.)

@srajama1
Copy link
Contributor

@bartlettroscoe : I suggest we fix it rather than disable it. As you note, we rarely go through disabled tests and reenable it.

@cgcgcg
Copy link
Contributor

cgcgcg commented Jun 21, 2019

PR #5402 is almost through the autotester..

@bartlettroscoe
Copy link
Member Author

As you note, we rarely go through disabled tests and reenable it.

There are exceptions such as #2410. In that case, the full test TeuchosNumerics_LAPACK_test_MPI_1 was disabled for a time and then it get re-enabled (and just a single unit tests for Power platforms got disabled). So it does happen.

@bartlettroscoe
Copy link
Member Author

It looks like PR #5402 made the failing test MueLu_Maxwell3D-Tpetra-ML-list_MPI_4 disappear in this build as shown here.

@cgcgcg, was this a mistake? Will this test be added back or is it gone for good? It is safe to go ahead and use the setting in this build Trilinos-atdm-sems-rhel6-gnu-7.2.0-openmp-release-debug-no-global-int in all of the ATDM Trilinos builds?

@cgcgcg
Copy link
Contributor

cgcgcg commented Jun 25, 2019

@bartlettroscoe The test is now only added if MueLu has support for ML and Epetra. Since this is not the case for this build, it disappeared (and should be gone for good).

@bartlettroscoe
Copy link
Member Author

The test is now only added if MueLu has support for ML and Epetra. Since this is not the case for this build, it disappeared (and should be gone for good).

Okay, since this test is invalid in this case we can go ahead and close this issue as done.

Thanks for resolving!

@bartlettroscoe
Copy link
Member Author

Closing for real :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATDM Sev: Blocker Problems that make Trilinos unfit to be adopted by one or more ATDM APPs client: ATDM Any issue primarily impacting the ATDM project PA: Linear Solvers Issues that fall under the Trilinos Linear Solvers Product Area pkg: MueLu type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

4 participants