Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrepid2: New errors in charon regression tests #8562

Closed
glhenni opened this issue Jan 11, 2021 · 12 comments
Closed

Intrepid2: New errors in charon regression tests #8562

glhenni opened this issue Jan 11, 2021 · 12 comments
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. type: bug The primary issue is a bug in Trilinos code or tests

Comments

@glhenni
Copy link
Contributor

glhenni commented Jan 11, 2021

Bug Report

@trilinos/<Intrepid2>

Description

Over the weekend our testing that links against Trilinos/develop threw errors from Intrepid2 that we haven't seen before. The error we're seeing is:

[Intrepid2] Error in file /scratch/charonops/jenkins/workspace/PL-Test-TrilinosUpdate_ASCIC_Build_Farm_CDE-GNU-dbg/tcad-charon/Trilinos/packages/intrepid2/src/Cell/Intrepid2_CellToolsDefValidateArguments.hpp, line 95
            Test that evaluated to true: startCell >= worksetCell.extent_int(0)
            startCell is out of bounds in workset. 

The failing tests were all finite-volume tests, what we refer to as Scharfetter Gummel, or SG tests, but the majority of SG tests passed so I'm not sure what makes the 4 that failed different.

Any suggestions on where to look for the problem?

Making sure @rppawlo and @karapeterson see this as they may have some insights.

@glhenni glhenni added the type: bug The primary issue is a bug in Trilinos code or tests label Jan 11, 2021
@rppawlo
Copy link
Contributor

rppawlo commented Jan 11, 2021

@trilinos/intrepid2

@mperego

@LawrenceCMusson
Copy link

I'd like to be updated on this issue. I have "subscribed," but @glhenni has been unable to add me directly.

@mperego
Copy link
Contributor

mperego commented Jan 11, 2021

Apologies for the inconvinience. We merged a huge PR (#8457) over the weekend by @CamelliaDPG.
Is it possible that you call setJacobian with an empty Workset? Probably Nate has a better idea of what might be causing this.

@rppawlo
Copy link
Contributor

rppawlo commented Jan 11, 2021

@glhenni - can you send a stack trace?

@CamelliaDPG
Copy link
Contributor

@glhenni I believe @mperego is likely right in the hypothesis he's suggesting. We added new optional arguments to setJacobian() in CellTools:

    static void
    setJacobian(       Kokkos::DynRankView<jacobianValueType,jacobianProperties...>       jacobian,
                 const Kokkos::DynRankView<pointValueType,pointProperties...>             points,
                 const WorksetType worksetCell,
                 const Teuchos::RCP<HGradBasisType> basis,
                 const int startCell=0, const int endCell=-1);

We also added a check that startCell is a valid index into the worksetCell container; this is the check that is failing for you. Since startCell is 0, the most likely explanation is that worksetCell is an empty container. I'm open to the idea of revising the arguments check to allow empty containers. Please let us know if that's what you'd like us to do.

@glhenni
Copy link
Contributor Author

glhenni commented Jan 11, 2021

I'll try and get a stack trace. It's definitely a parallel issue as it doesn't happen in a serial run. My guess would be a boundary issue of some kind where none of the entities reside on a particular node in the parallel run but there's some invocation of an evaluator being done on that node with a zero-size workset.

@rppawlo
Copy link
Contributor

rppawlo commented Jan 11, 2021

Drekar is also seeing compile failures.

https://github.com/rppawlo/DrekarTransfer/issues/520

@glhenni
Copy link
Contributor Author

glhenni commented Jan 11, 2021

charon stack trace:

#0  0x00007fffe981c8ce in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007ffff2ac017a in Intrepid2::CellTools_setJacobianArgs<Kokkos::DynRankView<double, Kokkos::LayoutStride, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::DynRankView<double, Kokkos::Serial>, Kokkos::DynRankView<double, Kokkos::LayoutStride, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> > > (jacobian=..., points=..., worksetCell=..., cellTopo=..., startCell=0, endCell=-1)
    at /home/glhenni/Projects/Charon2/tcad-charon/Trilinos/packages/intrepid2/src/Cell/Intrepid2_CellToolsDefValidateArguments.hpp:95
rppawlo/DrekarTransfer#2  0x00007ffff2ab4a35 in Intrepid2::CellTools<Kokkos::Serial>::setJacobian<double, Kokkos::LayoutStride, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u>, double, Kokkos::Serial, Kokkos::DynRankView<double, Kokkos::LayoutStride, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> >, Intrepid2::Basis<Kokkos::Serial, double, double> > (jacobian=..., points=..., worksetCell=..., basis=..., 
    startCell=0, endCell=-1) at /home/glhenni/Projects/Charon2/tcad-charon/Trilinos/packages/intrepid2/src/Cell/Intrepid2_CellToolsDefJacobian.hpp:824
rppawlo/DrekarTransfer#3  0x00007ffff2aaaaea in Intrepid2::CellTools<Kokkos::Serial>::setJacobian<double, Kokkos::LayoutStride, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u>, double, Kokkos::Serial, double, Kokkos::LayoutStride, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<0u> > (jacobian=..., points=..., worksetCell=..., cellTopo=...)
    at /home/glhenni/Projects/Charon2/tcad-charon/Trilinos/packages/intrepid2/src/Cell/Intrepid2_CellTools.hpp:471
rppawlo/DrekarTransfer#4  0x00007ffff2aa4b8e in Intrepid2::CubatureControlVolume<Kokkos::Serial, double, double>::getCubature (this=0x55555622ed60, cubPoints=..., 
    cubWeights=..., cellCoords=...)
    at /home/glhenni/Projects/Charon2/tcad-charon/Trilinos/packages/intrepid2/src/Discretization/Integration/Intrepid2_CubatureControlVolumeDef.hpp:143
rppawlo/DrekarTransfer#5  0x00007ffff2ae6d79 in panzer::IntegrationValues2<double>::getCubatureCV (this=0x55555628dfd0, in_node_coordinates=..., in_num_cells=-1)
    at /home/glhenni/Projects/Charon2/tcad-charon/Trilinos/packages/panzer/disc-fe/src/Panzer_IntegrationValues2.cpp:1196
rppawlo/DrekarTransfer#6  0x00007ffff2add1ab in panzer::IntegrationValues2<double>::evaluateValues (this=0x55555628dfd0, in_node_coordinates=..., in_num_cells=-1, 
    face_connectivity=...) at /home/glhenni/Projects/Charon2/tcad-charon/Trilinos/packages/panzer/disc-fe/src/Panzer_IntegrationValues2.cpp:279
rppawlo/DrekarTransfer#7  0x00007ffff60a8e95 in charon::CVFEM_WorksetFactory::addCVPointsAndBasis (this=0x5555560253d0, num_cells=0, needs=..., details=...)
    at /home/glhenni/Projects/Charon2/tcad-charon/src/Charon_CVFEM_WorksetFactory.cpp:194
rppawlo/DrekarBase#8  0x00007ffff60a8452 in charon::CVFEM_WorksetFactory::getWorksets (this=0x5555560253d0, worksetDesc=..., needs=...)
    at /home/glhenni/Projects/Charon2/tcad-charon/src/Charon_CVFEM_WorksetFactory.cpp:121
rppawlo/DrekarBase#9  0x00007ffff2cec36a in panzer::WorksetContainer::getWorksets (this=0x555556248650, wd=...)
    at /home/glhenni/Projects/Charon2/tcad-charon/Trilinos/packages/panzer/disc-fe/src/Panzer_WorksetContainer.cpp:121
rppawlo/DrekarBase#10 0x00007ffff4d1dc78 in panzer_stk::ModelEvaluatorFactory<double>::buildObjects (this=0x7fffffffba00, comm=..., global_data=..., eqset_factory=..., 
    bc_factory=..., user_cm_factory=..., meConstructionOn=true)
    at /home/glhenni/Projects/Charon2/tcad-charon/Trilinos/packages/panzer/adapters-stk/src/Panzer_STK_ModelEvaluatorFactory_impl.hpp:597
rppawlo/DrekarTransfer#11 0x00005555557a2a8d in main (argc=3, argv=0x7fffffffda78) at /home/glhenni/Projects/Charon2/tcad-charon/src/Charon_Main.cpp:854

@mperego
Copy link
Contributor

mperego commented Jan 11, 2021

at frame 7 of the stack trace, you see that num_cell=0, so I think it's what we suspected. As @CamelliaDPG said we can allow that to happen, if it's better for Charon.

@glhenni
Copy link
Contributor Author

glhenni commented Jan 11, 2021

I think I've figured out how to avoid the invocations in charon in the case of empty worksets. There was already a boolean set if the workset was empty I just had to wrap two calls to evaluate*() in them and everything should be fine. I'm building now and will update here when complete.

@github-actions
Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Mar 12, 2022
@github-actions
Copy link

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

5 participants