-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nalu diffs due to MueLu_Amesos2Smoother change #12383
Comments
Automatic mention of the @trilinos/muelu team |
@spdomin Thanks for reporting this. How significant are the diffs? We've had some applications bug out due to a loss of convergence, but that should be fixed. I'm taking it that that's not the case here? |
The loss of convergence that I recently reported was from a new eigenvalue approach that Chris pushed, however, I thought I closed that issue once we agreed it was a better change (I also increased the Chebyshev eigenvalue iteration count). Perhaps you mean another change - or this one? In my cases, I do not see large diffs - just order-of-opps-like magnitude. In fact, only the "strict" gnu platform found the diffs, e.g., Mac is more lax in diff checking. Moreover, in most all of the cases I spot checked, the average linear solver iterations for the continuity solve were the same, although this was not an exhaustive check since it is currently done by hand with a "diff" or "tkdiff". |
No, I managed to break Albany with the commit you referenced. I did push a fix the next day, but wanted to make sure it's not that. If there is indeed no change in iterations I would suggest to approve the diff. |
Okay, I suppose I will check the diffs between these two changes to make sure all is well. To be clear, the latest code base in Muelu is an improvement to the previous implementation, correct? If so, I can proceed with the re-bless. |
Yes. The commit and the subsequent had two changes:
|
I will proceed with the re-bless. Before I do that, do you have the set of Muelu XML parameters that activate this new code path/formulation? It looks like the amount of cases diffing is much smaller than the full set of Muelu cases. An XML is under: https://github.com/NaluCFD/Nalu/blob/master/reg_tests/xml/milestone.xml I have not checked if all of the diffs point to this xml, however, will do that while the cases are running. |
* Resolving Trilinos push: trilinos/Trilinos@eebc810 that caused Nalu diffs: trilinos/Trilinos#12383
* Resolving Trilinos push: trilinos/Trilinos@eebc810 that caused Nalu diffs: trilinos/Trilinos#12383
In order to use the pseudo-inverse you would need to set <ParameterList name="coarse: params">
<Parameter name="fix nullspace" type="bool" value="true"/>
</ParameterList> I would only recommend using that if you are 100% sure that the near-nullspace of the multigrid preconditioner is the actual nullspace because you are solving a singular problem. For the non-contiguous maps nothing has to be changed. |
Hmmm. None of the cases that diffed were a singular system. We only have one test for that, and it was the one that was failing from the eignevalue iteration change a couple of weeks ago. |
Moreover, I have clearly not activated this option. Finally, I do not believe we have non-contiguous maps? Did something else change? |
It did not seem that the diff magnitude changed over the past week. |
So the diff appeared with eebc810 and did not change since? Weird. |
Looking at eebc810 the first two chunks should not be active since you're not setting "fix nullspace". So the diff must be due to the last chunk that deals with non-contiguous maps. You could revert that and see if that is indeed the culprit. |
Could you refresh my memory on what constitutes a non-contiguous map? I know that in certain use cases, such as sliding mesh, overset, or periodic, we may have some sort of advanced matrix connectivity. However, many of the cases that failed are routine cases with standard inflow, open, and wall boundaries. One was a hybrid mesh for a duct flow. Any info would be nice to have so that we can all be sure that nothing else changed unknowingly. Possibly, one of the smallest cases might have an opportunity to drop a matrix file. |
Have a look at the section "Contiguous or noncontiguous" in the Tpetra documentation: |
@alanw0, could you please add perspective on why a routine Nalu inflow/open/wall bc case would have a noncontiguous map? Is this something that STK might be "suggesting" due to global id mapping? For a review, new Nalu diffs showed up due to a recent Muelu change that touched applications that have noncontinuous maps. |
We construct a Tpetra_Map like this, on line 317 of TpetraLinearSystem.C in nalu:
The |
@spdomin for the diffing case, is it diffing on 1 processor, or only diffing in parallel? |
Hi @alanw0, all of the cases are parallel. I can try a test in serial - once we understand the rules of what is and is not contiguous. Based on how the global ids are served up from Exodus, we may never have a contiguous system:) Thanks for chiming in:) |
There should be no change in behavior in parallel. |
Okay, this saves me extra testing, however, we are still not sure why these standard cases are interpreted as non-contiguous maps. Again, the premise set forth is that diffs will only show up when the maps are non-contiguous. As such, the open question is why they are in these simple cases. I care about this since if the map is contiguous, then something else changed to cause the diffs. |
Diffs accepted; still some questions on maps, however, we can take this offline. |
Bug Report
New diffs in Nalu's nightly test suite due to recent Muelu changes.
Bisect noted @cgcgcg:
This one looks like a bug fix. Let me know if: 1) the change is intended, 2) improves code quality, 3) Nalu diffs should be accepted.
The text was updated successfully, but these errors were encountered: