Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrilinosCouplings_ml_nox_1Delasticity_example_MPI_4 test failing in Trilinos-atdm-white-ride-cuda-9.2-debug-pt build #3551

Closed
bartlettroscoe opened this issue Oct 2, 2018 · 5 comments
Assignees
Labels
client: ATDM Any issue primarily impacting the ATDM project PA: Framework Issues that fall under the Trilinos Framework Product Area pkg: TrilinosCouplings type: bug The primary issue is a bug in Trilinos code or tests

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Oct 2, 2018

CC: @trilinos/trilinoscouplings , @jwillenbring (Trilinos Framework Product Area Lead), @fryeguy52

Next Action Status

PR #3568 merged on 10/5/2018 fixed this test as shown in the build on 10/6/2018.

Description

The test TrilinosCouplings_ml_nox_1Delasticity_example_MPI_4 is failing in the build Trilinos-atdm-white-ride-cuda-9.2-debug-pt on 'white' and 'ride' as shown here which shows the failing output:

5001
5001
5001
5001
TrilinosCouplings_ml_nox_1Delasticity_example.exe: /home/jenkins/white/workspace/Trilinos-atdm-white-ride-cuda-9.2-debug-pt/SRC_AND_BUILD/Trilinos/packages/trilinoscouplings/examples/ml/NonlinML/FiniteElementProblem.cpp:218: bool FiniteElementProblem::evaluate(FillType, const Epetra_Vector*, Epetra_Vector*, Epetra_RowMatrix*): Assertion `ierr' failed.
[white27:17997] *** Process received signal ***
[white27:17997] Signal: Aborted (6)
[white27:17997] Signal code:  (-6)
[white27:17997] [ 0] [0x3fffa9850478]
[white27:17997] TrilinosCouplings_ml_nox_1Delasticity_example.exe: /home/jenkins/white/workspace/Trilinos-atdm-white-ride-cuda-9.2-debug-pt/SRC_AND_BUILD/Trilinos/packages/trilinoscouplings/examples/ml/NonlinML/FiniteElementProblem.cpp:218: bool FiniteElementProblem::evaluate(FillType, const Epetra_Vector*, Epetra_Vector*, Epetra_RowMatrix*): Assertion `ierr' failed.
[white27:17998] *** Process received signal ***
[white27:17998] Signal: Aborted (6)
[white27:17998] Signal code:  (-6)
[white27:17998] [ 0] [0x3fff7ce70478]
[white27:17998] [ 1] TrilinosCouplings_ml_nox_1Delasticity_example.exe: /home/jenkins/white/workspace/Trilinos-atdm-white-ride-cuda-9.2-debug-pt/SRC_AND_BUILD/Trilinos/packages/trilinoscouplings/examples/ml/NonlinML/FiniteElementProblem.cpp:218: bool FiniteElementProblem::evaluate(FillType, const Epetra_Vector*, Epetra_Vector*, Epetra_RowMatrix*): Assertion `ierr' failed.
[white27:17999] *** Process received signal ***
[white27:17999] Signal: Aborted (6)
[white27:17999] Signal code:  (-6)
[white27:17999] [ 0] [0x3fffa4920478]
[white27:17999] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x3fff97081f94]
[white27:17999] [ 2] TrilinosCouplings_ml_nox_1Delasticity_example.exe: /home/jenkins/white/workspace/Trilinos-atdm-white-ride-cuda-9.2-debug-pt/SRC_AND_BUILD/Trilinos/packages/trilinoscouplings/examples/ml/NonlinML/FiniteElementProblem.cpp:218: bool FiniteElementProblem::evaluate(FillType, const Epetra_Vector*, Epetra_Vector*, Epetra_RowMatrix*): Assertion `ierr' failed.
[white27:18000] *** Process received signal ***
[white27:18000] Signal: Aborted (6)
[white27:18000] Signal code:  (-6)
...

This is an important build because we are targeting this build on 'white' and 'ride' as a Trilinos CUDA PR testing build (see #2464 ). However, the SPARC and EMPIRE ATDM Trilinos builds don't enable TrilinosCouplings so we could just disable this test and not impact ATDM at all.

Steps to reproduce

One should be able to reproduce these build errors on either 'white' or 'ride' by cloning the Trilinos git repo, checking out the 'develop' branch, creating a build directory, and then doing:

$ cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-9.2-debug

$ cmake \
  -GNinja \
  -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnvAllPtPackages.cmake \
  -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_TrilinosCouplings=ON \
  $TRILINOS_DIR

$ make NP=16

$ bsub -x -Is -q rhel7F -n 16 ctest -j16
@bartlettroscoe bartlettroscoe added type: bug The primary issue is a bug in Trilinos code or tests pkg: TrilinosCouplings client: ATDM Any issue primarily impacting the ATDM project labels Oct 2, 2018
@jhux2
Copy link
Member

jhux2 commented Oct 2, 2018

This should probably just be disabled for CUDA builds.

@jhux2
Copy link
Member

jhux2 commented Oct 4, 2018

@bartlettroscoe Is there a convenient guard I can use in CMakeLists.txt to avoid even running this under CUDA?

jhux2 added a commit that referenced this issue Oct 5, 2018
This commit fixes a nightly cdash error.  The problem was in the
check of the return code of a call to Epetra_CrsMatrix::SumIntoGlobalValues().
According to that method's documentation, a zero return code indicates
success.  However, the check was assert(ierr), which will fail if
ierr==0.

Addresses issue #3551.
@jhux2 jhux2 self-assigned this Oct 5, 2018
@jhux2
Copy link
Member

jhux2 commented Oct 5, 2018

This should probably just be disabled for CUDA builds.

Nope. There appears to be a real bug in the code. PR submitted.

trilinos-autotester added a commit that referenced this issue Oct 6, 2018
Automatically Merged using Trilinos Pull Request AutoTester
PR Title: TrilinosCouplings: fix issue #3551
PR Author: jhux2
@bartlettroscoe
Copy link
Member Author

Looks like PR #3568 merged on 10/5/2018 fixed this test as shown in the build on 10/6/2018 shown here (see the -1 subscript and +1 superscript by the number of failing and passing tests for Trilinos couplings). See the newly passing test at:

Closing as complete.

Thanks @jhux2!

@bartlettroscoe
Copy link
Member Author

Closing

@bartlettroscoe bartlettroscoe added the PA: Framework Issues that fall under the Trilinos Framework Product Area label Nov 30, 2018
tjfulle pushed a commit to tjfulle/Trilinos that referenced this issue Dec 6, 2018
This commit fixes a nightly cdash error.  The problem was in the
check of the return code of a call to Epetra_CrsMatrix::SumIntoGlobalValues().
According to that method's documentation, a zero return code indicates
success.  However, the check was assert(ierr), which will fail if
ierr==0.

Addresses issue trilinos#3551.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: ATDM Any issue primarily impacting the ATDM project PA: Framework Issues that fall under the Trilinos Framework Product Area pkg: TrilinosCouplings type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

2 participants